After cleaning and exploring the data in Part 1, we’ve now loaded it to move forward with model generation and regression analysis.

knitr::opts_chunk$set(echo = TRUE)

Loading packages

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Loading and Reading data

trained_data <- readr::read_rds("trained_data.rds")

The readr::read_csv() function displays the data types and column names associated with the data. However, a glimpse is shown below that reveals the number of rows and also shows some of the representative values for the columns.

trained_data %>% class()
## [1] "tbl_df"     "tbl"        "data.frame"
trained_data %>% glimpse()
## Rows: 835
## Columns: 12
## $ R                <dbl> 172, 26, 172, 28, 170, 175, 90, 194, 171, 122, 0, 88,…
## $ G                <dbl> 58, 88, 94, 87, 66, 89, 78, 106, 68, 151, 121, 140, 8…
## $ B                <dbl> 62, 151, 58, 152, 58, 65, 136, 53, 107, 59, 88, 58, 1…
## $ Lightness        <chr> "dark", "dark", "dark", "dark", "dark", "dark", "dark…
## $ Saturation       <chr> "bright", "bright", "bright", "bright", "bright", "br…
## $ Hue              <dbl> 4, 31, 8, 32, 5, 6, 34, 10, 1, 21, 24, 22, 36, 16, 26…
## $ response         <dbl> 12, 10, 16, 10, 11, 16, 10, 19, 14, 25, 14, 19, 14, 3…
## $ outcome          <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ Lightness_Label  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ Saturation_Label <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ y                <dbl> -1.9924302, -2.1972246, -1.6582281, -2.1972246, -2.09…
## $ logit_Hue        <dbl> -2.36712361, 1.79175947, -1.38629436, 2.04769284, -2.…

The data consist of continuous and categorical inputs. The glimpse() shown above reveals the data type for each variable which state to you whether the input is continuous or categorical. The RGB color model inputs, R, G, and B are continuous (dbl) inputs. The HSL color model inputs consist of 2 categorical inputs, Lightness and Saturation, and a continuous input, Hue. Two outputs are provided. The continuous output, response, and the Binary output, outcome. However, the data type of the Binary outcome is numeric because the Binary outcome is encoded as outcome = 1 for the EVENT and outcome = 0 for the NON-EVENT. Lightness_Label , Saturation_Label are the respective labels for Lightness and Saturation, y is the logit transformed value.

Regression

According to the instructions:

As stated in the project guidelines, you will not model the continuous output, response, directly. The response is a bounded variable between 0 and 100. The response must be transformed to an unbounded variable to appropriately be modeled by a Gaussian likelihood. We are making this transformation because we want the uncertainty in the predicted output to also satisfy output constraints. If we did not make this transformation the uncertainty could violate the bounds, which would mean the model is providing unphysical results! By logit-transforming response, we will fully respect the bounds of the output variable.

The data has already been transformed to an unbounded variable to appropriately be modeled by a Gaussian likelihood using LOGIT as a part of Data Cleaning.

iiA) Linear models

Using lm() to fit the following linear models: Intercept-only model – no INPUTS! Categorical variables only – linear additive Continuous variables only – linear additive All categorical and continuous variables – linear additive Interaction of the categorical inputs with all continuous inputs main effects Add categorical inputs to all main effect and all pairwise interactions of continuous inputs Interaction of the categorical inputs with all main effect and all pairwise interactions of continuous inputs Try non-linear basis functions based on your EDA. Can consider interactions of basis functions with other basis functions! Can consider interactions of basis functions with the categorical inputs!

# Model 1: Intercept-only model
mod1 <- lm(y ~ 1, data = trained_data)

# Model 2: Categorical variables only – linear additive
mod2 <- lm(y ~ Lightness + Saturation, data = trained_data)

# Model 3: Continuous variables only - linear additive
mod3 <- lm(y ~ R + G + B + Hue, data = trained_data)

# Model 4: All categorical and continuous variables - linear additive
mod4 <- lm(y ~ Lightness + Saturation + R + G + B + Hue, data = trained_data)

# Model 5: Interaction of categorical inputs with all continuous inputs main effects
mod5 <- lm(y ~ Lightness * R + Lightness * G + Lightness * B + Lightness * Hue + 
            Saturation * R + Saturation * G + Saturation * B + Saturation * Hue, data = trained_data)

# Model 6: Add categorical inputs to all main effect and all pairwise interactions of continuous inputs
mod6 <- lm(y ~ (R + G + B + Hue)^2 + Lightness + Saturation, data = trained_data)

# Model 7: Interaction of the categorical inputs with all main effect and all pairwise interactions of continuous inputs
mod7 <- lm(y ~ (Lightness + Saturation) * (R + G + B + Hue)^2, data = trained_data)

library(splines)
# Model 8: Model with basis functions of your choice
mod8 <- lm(y ~ ns(Hue, df = 3), data = trained_data)

# Model 9: Model with non-linear basis functions based on EDA
# Using natural cubic splines for continuous variables
mod9 <- lm(y ~ ns(R, df = 3) + ns(G, df = 3) + ns(B, df = 3) + ns(Hue, df = 3), data = trained_data)

# Model 10: Model considering interactions of basis functions with other basis functions and categorical inputs
# Interaction of spline basis functions with LightnessNum
mod10 <- lm(y ~ (ns(R, df = 3) + ns(G, df = 3) + ns(B, df = 3)) * Lightness, data = trained_data)

Which of the 10 models is the best? • What performance metric did you use to make your selection?

Calculating all the metrics and then selecting the best metrics based on the requirement .

extract_metrics <- function(mod, mod_name)
{
  broom::glance(mod) %>% mutate(mod_name = mod_name)
}

The extract_metrics() function is applied functionally to each lm() model object and the results are compiled into a dataframe with purrr::map_dfr().

all_metrics <- purrr::map2_dfr(list(mod1, mod2, mod3, mod4, mod5, mod6,mod7,mod8,mod9,mod10),
                               as.character(1:10),
                               extract_metrics)
all_metrics %>% glimpse()
## Rows: 10
## Columns: 13
## $ r.squared     <dbl> 0.00000000, 0.88463262, 0.98810380, 0.99450819, 0.997808…
## $ adj.r.squared <dbl> 0.00000000, 0.88294843, 0.98804647, 0.99440077, 0.997626…
## $ sigma         <dbl> 1.18434210, 0.40519660, 0.12948675, 0.08862198, 0.057703…
## $ statistic     <dbl> NA, 525.25537, 17235.04027, 9258.18445, 5477.47518, 8494…
## $ p.value       <dbl> NA, 0.000000000, 0.000000000, 0.000000000, 0.000000000, …
## $ df            <dbl> NA, 12, 4, 16, 64, 22, 142, 3, 12, 69
## $ logLik        <dbl> -1325.5849, -423.9378, 524.5814, 847.2924, 1230.8036, 94…
## $ AIC           <dbl> 2655.1698, 875.8757, -1037.1628, -1658.5849, -2329.6072,…
## $ BIC           <dbl> 2664.6246, 942.0597, -1008.7982, -1573.4911, -2017.5967,…
## $ deviance      <dbl> 1169.823619, 134.959484, 13.916459, 6.424454, 2.563881, …
## $ df.residual   <int> 834, 822, 830, 818, 770, 812, 692, 831, 822, 765
## $ nobs          <int> 835, 835, 835, 835, 835, 835, 835, 835, 835, 835
## $ mod_name      <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", "10"

A glimpse of the all_metrics object is shown above. Each row is a model and the columns correspond to various performance metrics associated with the training set performance.

Now that we have fit multiple models of varying complexity, it is time to identify the best performing model. I will be identifying the best model considering training set only performance metrics. I want to consider R-squared, AIC and BIC to find the best models from the above output.

Which of the 10 models is the best?

=> What performance metric did you use to make your selection?

I have considered 3 performance metrics to get the best model. Based on the above outputs we can say that:

Lower values of AIC and BIC indicate better-fitting models. In this case, Model 10 has the lowest BIC value (-2158.5817 ), suggesting that it provides the best balance of goodness of fit and model complexity among all the models evaluated.

Model 7 has the lowerst AIC value -2683.2244.

R-squared represents the proportion of the variance in the dependent variable (y) that is explained by the independent variables in the model. Higher R-squared values indicate better explanatory power of the model. Model 10 has the highest R-squared value of 0.99849868, suggesting that it explains a large proportion (approximately 99.85%) of the variance in the response variable.

In summary, based on the comprehensive evaluation of AIC, BIC, and R-squared values together, Model10 emerges as the top-performing model among all the evaluated models. It demonstrates superior goodness of fit, prediction accuracy, and explanatory power compared to other models in the analysis.

Although R- squared and AIC are the 2nd highest for Model10 , we consider Model 10 to be the best model because here we are concerned about fitting the model with minimal number of coefficients for which BIC is important.

Hence We can say that , Model10 i.e. Interaction of spline basis functions with LightnessNum” model stands out to be the best model from all the 10 models we have generated.

Visualization through graph

As we are considering R-squared, AIC, and BIC. THese three metrics are visualized in a single graphic in the figure below.The x-axis is the model name displayed as an integer. The facet corresponds to the metric with AIC on the left, BIC in the middle, and R-squared displayed in the right facet.

all_metrics %>% 
  select(mod_name, df, r.squared, AIC, BIC) %>% 
  pivot_longer(!c("mod_name", "df")) %>% 
  ggplot(mapping = aes(x = mod_name, y = value)) +
  geom_point(size = 5) +
  facet_wrap(~name, scales = "free_y") +
  theme_bw()

From the above plot we can say that, the top three models based on R-squared are 7, 10, and 5. According to AIC, the top three models are also 7, 10, and 5. Similarly, based on BIC, the top three models are 10, 9, and 5.

Even though model 7 shows the highest R-squared on the training set, indicating superior performance with the training data, both AIC and BIC penalize models based on their estimated parameters. Lower AIC and BIC values are preferred, indicating better model fit with fewer parameters.

While model 7 may seem to perform the best on the training set, the penalty term in BIC suggests that its added flexibility might not be justifiable. Model 5 consistently ranks third highest in both AIC and BIC, suggesting an optimal balance between model complexity and data fit. Thus, despite model 10’s lower R-squared performance, its simpler structure may lead to better generalization to new data compared to model 7, which could overfit due to its higher complexity.

Visualizing the coefficient summaries for the top 3 models

top_5_models <- all_metrics %>%
  arrange(desc(r.squared)) %>%
  head(5)
print(top_5_models)
## # A tibble: 5 × 13
##   r.squared adj.r.squared  sigma statistic p.value    df logLik    AIC    BIC
##       <dbl>         <dbl>  <dbl>     <dbl>   <dbl> <dbl>  <dbl>  <dbl>  <dbl>
## 1     0.999         0.999 0.0449     4089.       0   142  1486. -2683. -2002.
## 2     0.998         0.998 0.0521     6223.       0    69  1318. -2494. -2158.
## 3     0.998         0.998 0.0577     5477.       0    64  1231. -2330. -2018.
## 4     0.997         0.997 0.0666    21893.       0    12  1083. -2139. -2073.
## 5     0.996         0.996 0.0789     8494.       0    22   947. -1846. -1732.
## # ℹ 4 more variables: deviance <dbl>, df.residual <int>, nobs <int>,
## #   mod_name <chr>

From the above outputs we can see that the top 3 Model are : Model 10, Model 9, Model 5(wrt BIC - as for this scenario we are more concerned with BIC when compared to other entities)

From the analysis we can say that the top 3 models are Model

Coefplot for mod10:

mod10 %>% coefplot::coefplot() +
  theme_bw() +
  theme(legend.position = 'none')

mod10 %>% coef() %>% length()
## [1] 70

Coefplot for mod9:

mod9 %>% coefplot::coefplot() +
  theme_bw() +
  theme(legend.position = 'none')

mod9 %>% coef() %>% length()
## [1] 13

Coef plot for mod5:

mod5 %>% coefplot::coefplot() +
  theme_bw() +
  theme(legend.position = 'none')

mod5 %>% coef() %>% length()
## [1] 65

From the above plot for mod5 we can say that , Among the 65 features examined, only 11 show statistical significance as their 95% confidence intervals do not include zero. These significant features include Lightness - Pale, Lightness - Light, Lightness - Soft, Saturation - Neutral, Saturation - Gray, Lightness - Midtone, Saturation - Shaded, Saturation - Subdued, and Saturation - Muted.

From the above plot for mod9 Based on the coefficient summary, we can draw several conclusions:

The model utilizes natural cubic splines with 3 degrees of freedom for each continuous variable (R, G, B, and Hue) to capture nonlinear relationships with the response variable (y). It ranks second according to the Bayesian Information Criterion (BIC) values among the considered models and comprises 13 coefficients, including the intercept and coefficients for each natural spline.

Of the 13 coefficients, only the natural splines for Hue are not statistically significant, suggesting insufficient evidence to reject the null hypothesis that these coefficients are zero. However, the ten statistically significant coefficients, particularly for the natural splines of R and G, indicate a significant linear relationship with the response variable.

From the coefficient summary, it can be inferred that the significant coefficients for the natural splines of R and G highlight a substantial linear relationship between the red (R) and green (G) color components and the outcome variable. This implies that changes in the intensity of red and green colors may significantly impact the outcome variable, while changes in blue (B) and hue may not be as influential.

Overall, the model suggests that variations in R and G values are closely associated with changes in the outcome variable compared to variations in B and hue values.

Coef plot for mod10:

We can say that,

The coefficient summary plot depicts numerous features within the model, none of which appear to be statistically significant. Model mod10 incorporates a far greater number of features than the inputs in our dataset, highlighting the capacity to derive numerous features from the inputs. It’s important to note that the number of features in a linear model does not necessarily correspond to the number of inputs.

As illustrated in the plot, mod10 comprises 69 features in addition to the intercept, totaling 70 coefficients that require estimation. The inclusion of quadratic polynomials and their interactions has fundamentally altered the interpretation of significant features. Notably, the linear main-effect features are no longer statistically significant, and neither are the interactions between inputs.

Which inputs seem important?

Inputs with significant coefficients are deemed important predictors. In models 5 and 9, the inputs with significant coefficients include Lightness - Pale, Lightness light, lightness soft, Saturation Neutral, saturation gray, lightness midtone, saturation shaded, saturation subdued, saturation muted, R, G, and B.

From all the above observations , Model 10 is the best model.

Part ii: Regression– iiB) Bayesian Linear models

You have explored the relationships; next you must consider the UNCERTAINTY on the residual error through Bayesian modeling techniques!

Fit 2 Bayesian linear models – one must be the best model from iiA) and the second must be another model you fit in iiA)

We know that the Best Model from Part2A - Model 10

=> Model 10 considers interactions between basis functions (natural splines for R, G, and B) and a categorical input variable, Lightness. This model captures the combined effect of the spline basis functions with the Lightness variable, allowing for a more nuanced representation of the relationship between the predictor variables and the response.

Another Model - Model 9

Below, I am performing the Bayesian analysis using the Laplace Approximation. I will fit the Bayesian linear model with the Laplace Approximation by programming the log-posterior function. I would like to understand how model behaves with weak prior and strong prior.

Creating the Design Matrix for model 10 and model 9

Creating the design matrix following mod010’s formula, and we assign the object to the X03 variable. Lets first proceed by creating the design Matrix for Model 10:

mod10_formula <- y ~ (ns(R, df = 3) + ns(G, df = 3) + ns(B, df = 3)) * Lightness

# Extract the design matrix for Model10
X10 <- model.matrix(mod10_formula, data = trained_data)

info_mod10_weak <- list(
  yobs = trained_data$y,  # Using trained_data instead of df
  design_matrix = X10,
  mu_beta = 0,
  tau_beta = 50,
  sigma_rate = 1
)

Design Matrix for Model 9

# Define the model formula for Model9
mod9_formula <- y ~ ns(R, df = 3) + ns(G, df = 3) + ns(B, df = 3) + ns(Hue, df = 3)

# Extract the design matrix for Model9
X9 <- model.matrix(mod9_formula, data = trained_data)

# Define the list of required information for Model9
info_mod9_weak <- list(
  yobs = trained_data$y,
  design_matrix = X9,
  mu_beta = 0,
  tau_beta = 50, # Prior standard deviation
  sigma_rate = 1 # Rate parameter on the noise
)

Now as we are done with creating the design matrix, we will now define the log-posterior function lm_logpost(). WE will continue to use the log-transformation on σ , and so you will actually define the log-posterior in terms of the mean trend β-parameters and the unbounded noise parameter, φ=log[σ]

Defining the log-posterior function.

Using the Log-transformation on σ, and define the log-posterior in terms of the mean trend β-parameters and the unbounded noise parameter, φ=log[σ].

The unknown parameters to learn are contained within the first input argument, unknowns. The assumption is that the unknown β-parameters are listed before the unknown φ parameter in the unknowns vector. You must specify the number of β parameters programmatically to allow scaling up your function to an arbitrary number of unknowns. You will assume that all variables contained in the my_info list (the second argument to lm_logpost()) are the same fields in the info_mod9_weak list.

# Define the log-posterior function lm_logpost() for Model10
lm_logpost <- function(unknowns, my_info) {
  # specify the number of unknown beta parameters
  length_beta <- ncol(my_info$design_matrix)
  
  # extract the beta parameters from the `unknowns` vector
  beta_v <- unknowns[1:length_beta]
  
  # extract the unbounded noise parameter, varphi
  lik_varphi <- unknowns[length_beta + 1]
  
  # back-transform from varphi to sigma
  lik_sigma <- exp(lik_varphi)
  
  # extract design matrix
  X <- my_info$design_matrix
  
  # calculate the linear predictor
  mu <- as.vector(X %*% as.matrix(beta_v))
  
  # evaluate the log-likelihood
  log_lik <- sum(dnorm(x = my_info$yobs,
                       mean = mu,
                       sd = lik_sigma,
                       log = TRUE))
  
  # evaluate the log-prior for beta
  log_prior_beta <- sum(dnorm(x = beta_v,
                              mean = my_info$mu_beta,
                              sd = my_info$tau_beta,
                              log = TRUE))
  
  # evaluate the log-prior for sigma
  log_prior_sigma <- dexp(x = lik_sigma,
                          rate = my_info$sigma_rate,
                          log = TRUE)
  
  # add the log-priors together
  log_prior <- log_prior_beta + log_prior_sigma
  
  # account for the transformation
  log_derive_adjust <- lik_varphi
  
  # return the sum of log-likelihood, log-prior, and log-derivative adjustment
  log_lik + log_prior + log_derive_adjust
}

Definition of my_laplace() function

The below is the definition of my_laplace() function. This function executes the laplace approximation and returns the object consisting of the posterior mode, posterior covariance matrix, and the log-evidence.

my_laplace <- function(start_guess, logpost_func, ...)
{
  # code adapted from the `LearnBayes`` function `laplace()`
  fit <- optim(start_guess,
               logpost_func,
               gr = NULL,
               ...,
               method = "BFGS",
               hessian = TRUE,
               control = list(fnscale = -1, maxit = 1001))
  
  mode <- fit$par
  post_var_matrix <- -solve(fit$hessian)
  p <- length(mode)
  int <- p/2 * log(2 * pi) + 0.5 * log(det(post_var_matrix)) + logpost_func(mode, ...)
  # package all of the results into a list
  list(mode = mode,
       var_matrix = post_var_matrix,
       log_evidence = int,
       converge = ifelse(fit$convergence == 0,
                         "YES", 
                         "NO"),
       iter_counts = as.numeric(fit$counts[1]))
}

Executing the Laplace Approximation for the model 10 formulation and the model 9 formulation.

For Model 10:

# Execute the Laplace Approximation for Model10
laplace_mod10_weak <- my_laplace(rep(0, ncol(X10) + 1), lm_logpost, info_mod10_weak)

# Confirm convergence for Model10
convergence_mod10 <- laplace_mod10_weak$converge

# Print convergence status for Model10
cat("Model 10 Convergence:", convergence_mod10, "\n")
## Model 10 Convergence: YES

For Model9:

# Execute the Laplace Approximation for Model9
laplace_mod9_weak <- my_laplace(rep(0, ncol(X9) + 1), lm_logpost, info_mod9_weak)

# Confirm convergence for Model9
convergence_mod9 <- laplace_mod9_weak$converge

# Print convergence status for Model9
cat("Model 9 Convergence:", convergence_mod9, "\n")
## Model 9 Convergence: YES

Creating the posterior summary visualization figure for model 10 and model 9.

Below is a function that creates a coefficient summary plot in the style of the coefplot() function, but uses the Bayesian results from the Laplace Approximation.

viz_post_coefs <- function(post_means, post_sds, xnames)
{
  tibble::tibble(
    mu = post_means,
    sd = post_sds,
    x = xnames
  ) %>% 
    mutate(x = factor(x, levels = xnames)) %>% 
    ggplot(mapping = aes(x = x)) +
    geom_hline(yintercept = 0, color = 'grey', linetype = 'dashed') +
    geom_point(mapping = aes(y = mu)) +
    geom_linerange(mapping = aes(ymin = mu - 2 * sd,
                                 ymax = mu + 2 * sd,
                                 group = x)) +
    labs(x = 'feature', y = 'coefficient value') +
    coord_flip() +
    theme_bw()
}

Creating the posterior summary visualization figure for model 10

The coefficient posterior summaries for mod10 are visualized below. We can see that the number of columns in the X10 design matrix are used to identify the number of regression coefficients (beta parameters).

viz_post_coefs(laplace_mod10_weak$mode[1:ncol(X10)],
               sqrt(diag(laplace_mod10_weak$var_matrix)[1:ncol(X10)]),
               colnames(X10))

Creating the posterior summary visualization figure for model 9

Likewise, the posterior coefficient summaries for mod9 are shown below. Again the number of columns in the design matrix is used to identify the regression coefficients.

viz_post_coefs(laplace_mod9_weak$mode[1:ncol(X9)],
               sqrt(diag(laplace_mod9_weak$var_matrix)[1:ncol(X9)]),
               colnames(X9))

After fitting the 2 models, you must identify the best model -Which performance metric did you use to make your selection?

Using the Bayes Factor to identify the better of the models with weak prior.

For identifying the best model we are using the Bayes Factor:

I am using log-arithmetic to calculate the Bayes Factor. The approach is to exponentiate the difference of the log-evidences to calculate the Bayes Factor.

# Calculate the Bayes Factor
exp( laplace_mod9_weak$log_evidence - laplace_mod10_weak$log_evidence )
## [1] 6.259078e+47

A Bayes Factor greater than 1 represents there is more “evidence” to support the “numerator model” compared to the model in the denominator. As shown above the Bayes Factor is on the order of 1E88 which is a really very huge number. Essentially, the log-arithematic (via the Bayes Factor) feels there is no reason to consider mod9 compared to mod10. The Bayes Factor is telling us that the training set performance of mod9 does not matter, this model will not generalize to new data.

The output strongly favors Model 09 over Model 10, as indicated by the Bayes Factor calculated using log-arithmetic. This extremely small value provides compelling evidence in support of Model 09 over Model 10. In essence, considering the data and the assumed weak prior beliefs, Model 09 is significantly preferred over Model 10.

Hence we can say that, The best performing model with weak prior is Model 09.

Fitting the Bayesian Model using a string prior

Now, I’m trying a more informative or strong prior by reducing the prior standard deviation on the regression coefficients from 50 to 1. The prior mean will still be zero.

Design Matrix for model10 and model9:

info_mod10_strong <- list(
  yobs = trained_data$y,
  design_matrix = X10,
  mu_beta = 0,
  tau_beta = 1,
  sigma_rate = 1
)

info_mod9_strong <- list(
  yobs = trained_data$y,
  design_matrix = X9,
  mu_beta = 0,
  tau_beta = 1,
  sigma_rate = 1
)

Executing the Laplace Approximation.

# Execute the Laplace Approximation for Model9
laplace_mod9_strong <- my_laplace(rep(0, ncol(X9) + 1), lm_logpost, info_mod9_strong)

# Confirm convergence for Model9
convergence_mod9 <- laplace_mod9_strong$converge

# Print convergence status for Model9
cat("Model 9 Convergence:", convergence_mod9, "\n")
## Model 9 Convergence: YES
# Execute the Laplace Approximation for Model10
laplace_mod10_strong <- my_laplace(rep(0, ncol(X10) + 1), lm_logpost, info_mod10_strong)

# Confirm convergence for Model9
convergence_mod10 <- laplace_mod10_strong$converge

# Print convergence status for Model9
cat("Model 10 Convergence:", convergence_mod10, "\n")
## Model 10 Convergence: YES

viz_post_coefs() function to visualize the posterior coefficient summaries for model 10 and model 9, based on the strong prior specification.

The stronger prior is restricting the coefficients to the range the data (likelihood) was supporting anyway! Thus, the posterior for this particular model and data is not influenced by this specific prior.

Visualize the regression coefficient posterior summary statistics.

Using the viz_post_coefs() function to visualize the posterior coefficient summaries for model 9 and model 10, based on the strong prior specification.

viz_post_coefs(laplace_mod10_strong$mode[1:ncol(X10)],
               sqrt(diag(laplace_mod10_strong$var_matrix)[1:ncol(X10)]),
               colnames(X10))

The coefficient posteriors for mod10 still exhibit significant uncertainty. It’s crucial to observe the x-axis scale in the figure, which shows that all coefficient posterior means fall within the range of -3 to +4. However, when employing the weak prior, the posterior coefficient summaries for mod10 covered a much broader range, spanning from -50 to 50. With the weak prior, many coefficients had posterior means in the double digits. The adoption of a more restrictive “strong” prior, characterized by a prior standard deviation of 1, is constraining the coefficients from attaining values supported solely by the data for mod10.

The posterior coefficient summaries based on the strong prior for model 9 are shown below.

viz_post_coefs(laplace_mod9_strong$mode[1:ncol(X9)],
               sqrt(diag(laplace_mod9_strong$var_matrix)[1:ncol(X9)]),
               colnames(X9))

The posterior coefficient summaries for mod03 seem to align closely with what we observed with the weak prior. Consequently, our stronger prior is merely confining the coefficients within the range already supported by the data (likelihood). As a result, the posterior for this particular model and dataset remains largely unaffected by this specific prior.

Performance of the model with Bayes Factors again, but considering the results based on the strong prior.

exp( laplace_mod9_strong$log_evidence - laplace_mod10_strong$log_evidence )
## [1] 4.892183e-37

The Bayes Factor strongly favors mod09 over mod10, indicating that mod10 is not a suitable model compared to mod09. This preference holds true for both weak and strong priors. The Bayes Factor is still so large, we should not consider mod10 an appropriate model compared to mod09.

Lets try to find the best model using AIC -BIC Method:

# Calculate AIC and BIC for Model 10
aic_mod10 <- AIC(lm(y ~ (ns(R, df = 3) + ns(G, df = 3) + ns(B, df = 3)) * Lightness, data = trained_data))
bic_mod10 <- BIC(lm(y ~ (ns(R, df = 3) + ns(G, df = 3) + ns(B, df = 3)) * Lightness, data = trained_data))

# Calculate AIC and BIC for Model 9
aic_mod9 <- AIC(lm(y ~ ns(R, df = 3) + ns(G, df = 3) + ns(B, df = 3) + ns(Hue, df = 3), data = trained_data))
bic_mod9 <- BIC(lm(y ~ ns(R, df = 3) + ns(G, df = 3) + ns(B, df = 3) + ns(Hue, df = 3), data = trained_data))

# Print AIC and BIC for both models
cat("AIC for Model 10:", aic_mod10, "\n")
## AIC for Model 10: -2494.083
cat("BIC for Model 10:", bic_mod10, "\n")
## BIC for Model 10: -2158.435
cat("AIC for Model 9:", aic_mod9, "\n")
## AIC for Model 9: -2138.961
cat("BIC for Model 9:", bic_mod9, "\n")
## BIC for Model 9: -2072.777

According to the above outputs we can see that

After fitting the 2 models, we identified the best model based on the AIC and BIC. These metrics are commonly used for model selection, with lower values indicating better-fitting models.

Model 10 achieved an AIC of -2493.083 and a BIC of -2158.435. Model 9 achieved an AIC of -2138.961 and a BIC of -2072.777.

Since Model 10 exhibited lower AIC and BIC values compared to Model 9, we can conclude that Model 10 is the best model based on these performance metrics

Visualize the regression coefficient posterior summary statistics for your best model.

We have taken Model 10 as our best model. Coefficient posterior summary statistics for Model 10 is as follows:

viz_post_coefs(laplace_mod10_strong$mode[1:ncol(X10)],
               sqrt(diag(laplace_mod10_strong$var_matrix)[1:ncol(X10)]),
               colnames(X10))

The above oefficient posterior summaries for mod10 under both the strong and weak priors are depicted below. Initially, it may appear that the coefficient posteriors for mod10 remain uncertain. However, it’s essential to observe the x-axis scale in the visualization. The figure indicates that all coefficient posterior means fall within the range of -2 to +4. In contrast, the posterior coefficient summaries for mod10 associated with the weak prior exhibit a much broader range. With the weak prior, many coefficients had posterior means in the double digits. The more restrictive “strong” prior, characterized by a prior standard deviation of 1, constrains the coefficients from reaching values supported solely by the data for mod10!

viz_post_coefs(laplace_mod10_weak$mode[1:ncol(X10)],
               sqrt(diag(laplace_mod10_weak$var_matrix)[1:ncol(X10)]),
               colnames(X10))

For your best model: Study the posterior UNCERTAINTY on the likelihood noise (residual error), 𝜎

# Extract the posterior uncertainty on sigma (𝜎) from the covariance matrix of mod10
posterior_sigma_uncertainty <- sqrt(diag(laplace_mod10_strong$var_matrix))[ncol(X10) + 1]

# Print the posterior uncertainty on sigma (𝜎)
cat("Posterior uncertainty on sigma (𝜎) for Model 10:", posterior_sigma_uncertainty, "\n")
## Posterior uncertainty on sigma (𝜎) for Model 10: 0.02452012

How does the lm() maximum likelihood estimate (MLE) on 𝜎 relate to the posterior UNCERTAINTY on 𝜎?

The maximum likelihood estimate (MLE) of 𝜎, obtained from the lm() function, serves as a single-point estimate of the standard deviation parameter based solely on the observed data. It results from maximizing the likelihood function, which gauges the likelihood of observing the data given a specific value of 𝜎.

On the other hand, the posterior uncertainty on 𝜎, derived from Bayesian inference, encompasses the distribution of potential 𝜎 values, taking into account both the observed data and any prior information. This uncertainty is encapsulated by the posterior distribution, which considers the entire spectrum of plausible 𝜎 values along with their associated probabilities.

The relationship between the MLE and the posterior uncertainty on 𝜎 hinges on the chosen prior for the Bayesian analysis. A more informative prior, like a strong prior with a smaller standard deviation, has the potential to constrain the posterior distribution, thereby reducing uncertainty compared to the MLE. Conversely, a less informative prior, such as a weak prior with a larger standard deviation, may yield a broader posterior distribution, resulting in heightened uncertainty.

To study the posterior coefficient uncertainty.

Keeping it simple. as we did in assignments, by assuming a value of σ=1.Lets think of this as looking at the scaled uncertainty. We’ll start by assuming that the prior doesn’t have an impact, meaning we’re assuming an infinitely diffuse prior or a prior with infinite uncertainty.

Do you feel the posterior is precise or are we quite uncertain about 𝜎?

  • Under the weak prior, the coefficient posterior summaries for mod10 exhibited a notably broader range, with numerous coefficients showing posterior means in the double digits. This underscores substantial uncertainty in the coefficient estimates.

  • Employing a strong prior with a standard deviation of 1 has constrained the coefficients, preventing them from attaining values solely supported by the data. This imposition indicates uncertainty in the posterior estimates.

  • Bayesian analysis inherently acknowledges uncertainty through the posterior distribution, encapsulating a spectrum of potential 𝜎 values along with their associated probabilities. The prevalence of wide posterior distributions signifies uncertainty in 𝜎 estimation.

Part ii: Regression – iiC) Linear models Predictions

Now we delve into the predictive tendencies of the models to gain a deeper understanding of their behavior. To facilitate this, I’ll craft a prediction grid tailored to our needs using the expand.grid() function. Below is the code snippet that defines the grid:

# Define unique values for Lightness and Saturation
lightness_values <- c("dark", "deep", "light", "midtone", "pale", "saturated", "soft")
saturation_values <- c("bright", "gray", "muted", "neutral", "pure", "subdued", "shaded")

# Generate the grid
viz_grid <- expand.grid(
  R = sample(0:255, 2),  # Adjust the number of values sampled as needed
  G = sample(0:255, 2),  # Adjust the number of values sampled as needed
  B = sample(0:255, 2),  # Adjust the number of values sampled as needed
  Lightness = lightness_values,
  Saturation = saturation_values,
  Hue = sample(0:36, 2)
)

# Generate random integers for R, G, B for each row
viz_grid$R <- sample(0:255, nrow(viz_grid), replace = TRUE)
viz_grid$G <- sample(0:255, nrow(viz_grid), replace = TRUE)
viz_grid$B <- sample(0:255, nrow(viz_grid), replace = TRUE)
viz_grid$Hue <- sample(0:36, nrow(viz_grid), replace = TRUE)

# Display the structure of viz_grid
glimpse(viz_grid)
## Rows: 784
## Columns: 6
## $ R          <int> 132, 15, 243, 177, 169, 240, 119, 230, 86, 249, 22, 19, 159…
## $ G          <int> 89, 255, 127, 232, 228, 171, 220, 19, 141, 126, 167, 51, 86…
## $ B          <int> 115, 176, 174, 211, 40, 136, 94, 92, 114, 28, 90, 29, 87, 2…
## $ Lightness  <fct> dark, dark, dark, dark, dark, dark, dark, dark, deep, deep,…
## $ Saturation <fct> bright, bright, bright, bright, bright, bright, bright, bri…
## $ Hue        <int> 10, 7, 13, 1, 32, 30, 8, 8, 15, 29, 10, 19, 0, 10, 27, 5, 1…

Posterior samples have been generated, enabling us to compute the posterior samples of the mean trend and generate random posterior samples of the response around the mean. The glimpse provided above reveals 784 combinations of the 6 inputs, implying that over 700 predictions will be made to examine the trends of the event probability.

Next, we will summarize the posterior predictions of the mean and the random response. Predictions will be made for each of the models, and their trends will be visualized. To facilitate this process, a function named tidy_predict() has been created. This function consolidates the predicted mean trend, the confidence interval, and the prediction interval into a tibble, along with the input values, streamlining the visualization process.

tidy_predict <- function(mod, xnew)
{
  pred_df <- predict(mod, xnew, interval = "confidence") %>% 
    as.data.frame() %>% tibble::as_tibble() %>% 
    dplyr::select(pred = fit, ci_lwr = lwr, ci_upr = upr) %>% 
    bind_cols(predict(mod, xnew, interval = 'prediction') %>% 
                as.data.frame() %>% tibble::as_tibble() %>% 
                dplyr::select(pred_lwr = lwr, pred_upr = upr))
  
  xnew %>% bind_cols(pred_df)
}

The tidy_predict() function takes two arguments: 1. An lm() model object, which serves as the trained model. 2. A new or test dataframe containing input variables for prediction.

When utilizing lm() and its predict() method, these functions automatically generate the test design matrix based on the formula stored within the lm() model object. This ensures consistency with the training design matrix used during model training.

Making predictions with 2 models selected from Part 2A using the linear non bayesian models for mod9 and mod10 using the visualization grid, viz_grid. The predictions will be assigned to the variables pred_lm_09 and pred_lm_10

pred_lm_02 <- tidy_predict(mod2, viz_grid)
pred_lm_03 <- tidy_predict(mod3, viz_grid)
pred_lm_04 <- tidy_predict(mod4, viz_grid)
pred_lm_05 <- tidy_predict(mod5, viz_grid)
pred_lm_06 <- tidy_predict(mod6, viz_grid)
pred_lm_07 <- tidy_predict(mod7, viz_grid)
pred_lm_08 <- tidy_predict(mod8, viz_grid)
pred_lm_09 <- tidy_predict(mod9, viz_grid)
pred_lm_10 <- tidy_predict(mod10, viz_grid)

A glimpse of the pred_lm_09 and pred_lm_10 tibble is shown below.

pred_lm_09 %>% glimpse()
## Rows: 784
## Columns: 11
## $ R          <int> 132, 15, 243, 177, 169, 240, 119, 230, 86, 249, 22, 19, 159…
## $ G          <int> 89, 255, 127, 232, 228, 171, 220, 19, 141, 126, 167, 51, 86…
## $ B          <int> 115, 176, 174, 211, 40, 136, 94, 92, 114, 28, 90, 29, 87, 2…
## $ Lightness  <fct> dark, dark, dark, dark, dark, dark, dark, dark, deep, deep,…
## $ Saturation <fct> bright, bright, bright, bright, bright, bright, bright, bri…
## $ Hue        <int> 10, 7, 13, 1, 32, 30, 8, 8, 15, 29, 10, 19, 0, 10, 27, 5, 1…
## $ pred       <dbl> -1.9297737273, 0.9624680924, -0.5585489800, 1.0194943026, 0…
## $ ci_lwr     <dbl> -1.95081488, 0.88688018, -0.59401714, 0.98450432, 0.3746002…
## $ ci_upr     <dbl> -1.90873258, 1.03805601, -0.52308082, 1.05448429, 0.4945633…
## $ pred_lwr   <dbl> -2.06223092, 0.81141939, -0.69404868, 0.88411899, 0.2907069…
## $ pred_upr   <dbl> -1.79731654, 1.11351679, -0.42304928, 1.15486962, 0.5784566…
pred_lm_10 %>% glimpse()
## Rows: 784
## Columns: 11
## $ R          <int> 132, 15, 243, 177, 169, 240, 119, 230, 86, 249, 22, 19, 159…
## $ G          <int> 89, 255, 127, 232, 228, 171, 220, 19, 141, 126, 167, 51, 86…
## $ B          <int> 115, 176, 174, 211, 40, 136, 94, 92, 114, 28, 90, 29, 87, 2…
## $ Lightness  <fct> dark, dark, dark, dark, dark, dark, dark, dark, deep, deep,…
## $ Saturation <fct> bright, bright, bright, bright, bright, bright, bright, bri…
## $ Hue        <int> 10, 7, 13, 1, 32, 30, 8, 8, 15, 29, 10, 19, 0, 10, 27, 5, 1…
## $ pred       <dbl> -1.89408265, -0.64997340, -0.59994767, -1.11195226, -0.0907…
## $ ci_lwr     <dbl> -1.92345331, -1.89043936, -1.51621101, -7.51599930, -0.4042…
## $ ci_upr     <dbl> -1.8647120, 0.5904926, 0.3163157, 5.2920948, 0.2227305, 0.2…
## $ pred_lwr   <dbl> -2.00058441, -1.89465641, -1.52191215, -7.51681748, -0.4205…
## $ pred_upr   <dbl> -1.78758090, 0.59470961, 0.32201682, 5.29291297, 0.23902221…

Part ii: Regression – iiD) Train/tune with resampling

Using Linear Models:

According to the models we have generated:

Linear models: => All categorical and continuous inputs - linear additive features - mod4 (y ~ Lightness + Saturation + R + G + B + Hue)

=> Add categorical inputs to all main effect and all pairwise interactions of continuous inputs - mod6 (y ~ (R + G + B + Hue)^2 + Lightness + Saturation))

=> The 2 models selected from iiA) which are mod9 and mod10

mod9 - y ~ ns(R, df = 3) + ns(G, df = 3) + ns(B, df = 3) + ns(Hue, df = 3) mod10 - y ~ (ns(R, df = 3) + ns(G, df = 3) + ns(B, df = 3)) * Lightness

So according to the instructions we have to train and tune mod4, mod6 mod9 and mod10

Specifying the resampling scheme

library(caret)
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift
my_ctrl <- trainControl(method = "cv", number = 5)

In the above code, we are using 5 fold cross-validation. Each model will be trainined 5 times and tested 5 times.

Selecting a primary performance metric to use to compare the models.

We need to use a performance metric associated with regression. Thus, I’m choosing RMSE. We are using 5 fold cross-validation and thus each fold’s holdout test set contains 20% of the data.

my_metric <- 'RMSE'

Training and Tuning the models

Training Model 4

set.seed(2001)
trained_mod4 <- train(y ~ Lightness + Saturation + R + G + B + Hue, 
                       data = trained_data,
                       method = "lm",
                       metric = my_metric,
                       trControl = my_ctrl)

trained_mod4
## Linear Regression 
## 
## 835 samples
##   6 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 668, 668, 669, 667, 668 
## Resampling results:
## 
##   RMSE        Rsquared   MAE       
##   0.08961727  0.9943377  0.06734552
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE

Training Model 6

set.seed(2001)
trained_mod6 <- train(y ~ (R + G + B + Hue)^2 + Lightness + Saturation, 
                       data = trained_data,
                       method = "lm",
                       metric = my_metric,
                       trControl = my_ctrl)

trained_mod6
## Linear Regression 
## 
## 835 samples
##   6 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 668, 668, 669, 667, 668 
## Resampling results:
## 
##   RMSE        Rsquared   MAE       
##   0.08053766  0.9954601  0.06089084
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE

Training Model 9:

set.seed(2001)
trained_mod9 <- train(y ~ ns(R, df = 3) + ns(G, df = 3) + ns(B, df = 3) + ns(Hue, df = 3), 
                       data = trained_data,
                       method = "lm",
                       metric = my_metric,
                       trControl = my_ctrl)

trained_mod9
## Linear Regression 
## 
## 835 samples
##   4 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 668, 668, 669, 667, 668 
## Resampling results:
## 
##   RMSE        Rsquared   MAE       
##   0.06751746  0.9967728  0.04612248
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE

Training Model10:

set.seed(2001)
trained_mod10 <- train(
   y ~ (ns(R, df = 3) + ns(G, df = 3) + ns(B, df = 3)) * Lightness, 
  data = trained_data,
  method = "lm",
  metric = my_metric,
  trControl = my_ctrl
)

trained_mod10
## Linear Regression 
## 
## 835 samples
##   4 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 668, 668, 669, 667, 668 
## Resampling results:
## 
##   RMSE        Rsquared   MAE     
##   0.06381023  0.9969635  0.039008
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE

Model Training Results:

The code chunk below compiles all of the model training results for you. The trained_results object can be used to compare the models through tables and visualizations.

trained_results = resamples(list(fit_4 = trained_mod4,
                                 fit_6 = trained_mod6,
                                 fit_9 = trained_mod9,
                                 fit_10 = trained_mod10))

Summary Tables for the trained models

trained_results |> summary(metric = 'RMSE')
## 
## Call:
## summary.resamples(object = trained_results, metric = "RMSE")
## 
## Models: fit_4, fit_6, fit_9, fit_10 
## Number of resamples: 5 
## 
## RMSE 
##              Min.    1st Qu.     Median       Mean    3rd Qu.       Max. NA's
## fit_4  0.08350267 0.08652551 0.08872158 0.08961727 0.09217503 0.09716155    0
## fit_6  0.07421857 0.07906394 0.07919018 0.08053766 0.08024848 0.08996715    0
## fit_9  0.05901126 0.06521073 0.06581390 0.06751746 0.06667169 0.08087970    0
## fit_10 0.05022380 0.05431282 0.05571196 0.06381023 0.06604841 0.09275416    0
trained_results |> dotplot(metric = 'RMSE')

Now that we have trained the models and have drawn the box plot for the above trained models I can see that fit_10 i.e. has the lowest RMSE which makes it the best model.

Regularized regression with Elastic net

According to the given instructions we are now training the following models:

• Add categorical inputs to all main effect and all pairwise interactions of continuous inputs - mod6 (y ~ (R + G + B + Hue)^2 + Lightness + Saturation)

• The more complex of the 2 models selected from iiA) - Mod10 is the more complex model from the 2 models selected in Part2A - mod10 (y ~ (ns(R, df = 3) + ns(G, df = 3) + ns(B, df = 3)) * Lightness)

We are training, assessing, and tuning elastic model using the default caret tuning grid. The models should interact the categorical input to the linear main continuous effects, interaction between continuous, and quadratic continuous features. The binary outcome is now named outcome and not y.

set.seed(1234)

enet_default_mod6 <- train( y ~ (R + G + B + Hue)^2 + Lightness + Saturation,
                       data = trained_data,
                       method = 'glmnet',
                       metric = my_metric,
                       preProcess = c("center", "scale"),
                       trControl = my_ctrl)

enet_default_mod10 <- train( y ~ (ns(R, df = 3) + ns(G, df = 3) + ns(B, df = 3)) * Lightness,
                       data = trained_data,
                       method = 'glmnet',
                       metric = my_metric,
                       preProcess = c("center", "scale"),
                       trControl = my_ctrl)

enet_default_mod6
## glmnet 
## 
## 835 samples
##   6 predictor
## 
## Pre-processing: centered (22), scaled (22) 
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 668, 668, 668, 668, 668 
## Resampling results across tuning parameters:
## 
##   alpha  lambda       RMSE        Rsquared   MAE       
##   0.10   0.002324946  0.08179611  0.9952651  0.06177323
##   0.10   0.023249462  0.08779076  0.9946092  0.06486433
##   0.10   0.232494618  0.14937783  0.9876432  0.11017555
##   0.55   0.002324946  0.08383331  0.9950262  0.06251216
##   0.55   0.023249462  0.09643629  0.9935995  0.06990620
##   0.55   0.232494618  0.20085400  0.9893951  0.14960444
##   1.00   0.002324946  0.08446765  0.9949473  0.06263570
##   1.00   0.023249462  0.10221982  0.9929677  0.07338187
##   1.00   0.232494618  0.26695708  0.9892905  0.20873090
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.002324946.
enet_default_mod10
## glmnet 
## 
## 835 samples
##   4 predictor
## 
## Pre-processing: centered (69), scaled (69) 
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 667, 668, 669, 668, 668 
## Resampling results across tuning parameters:
## 
##   alpha  lambda       RMSE        Rsquared   MAE       
##   0.10   0.001924909  0.05845383  0.9976002  0.03900669
##   0.10   0.019249086  0.07006673  0.9966585  0.04629241
##   0.10   0.192490860  0.17618482  0.9847024  0.11575708
##   0.55   0.001924909  0.06185743  0.9973119  0.04042714
##   0.55   0.019249086  0.07621231  0.9963884  0.05055384
##   0.55   0.192490860  0.29214725  0.9709397  0.20976599
##   1.00   0.001924909  0.06406338  0.9971187  0.04163942
##   1.00   0.019249086  0.09046666  0.9951489  0.05972747
##   1.00   0.192490860  0.41538491  0.9371825  0.29688530
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.001924909.

Creating a custom tuning grid to further tune the elastic net lambda and alpha tuning parameters.

Creating a tuning grid with the expand.grid() function which has two columns named alpha and lambda. The alpha variable should be evenly spaced between 0.1 and 1.0 by increments of 0.1. The lambda variable should have 25 evenly spaced values in the log-space between the minimum and maximum lambda values from the caret default tuning grid.

my_lambda_grid_mod6 <- exp(seq(log(min(enet_default_mod6$results$lambda)),
                          log(max(enet_default_mod6$results$lambda)),
                          length.out = 25))

my_lambda_grid_mod10 <- exp(seq(log(min(enet_default_mod10$results$lambda)),
                          log(max(enet_default_mod10$results$lambda)),
                          length.out = 25))

The enet_grid is defined below by combining the my_lambda_grid with a grid of alpha or mixing fraction values between 0.1 and 1.0.

enet_grid_mod6 <- expand.grid(alpha = seq(0.1, 1.0, by = 0.1),
                         lambda = my_lambda_grid_mod6)
enet_grid_mod10 <- expand.grid(alpha = seq(0.1, 1.0, by = 0.1),
                         lambda = my_lambda_grid_mod10)

The number of combinations generated by expand.grid() can be found by checking the dimensions of enet_grid. As shown below, we will try out 250 tuning parameter combinations!

enet_grid_mod6 %>% dim()
## [1] 250   2
enet_grid_mod10 %>% dim()
## [1] 250   2

Training, assessing, and tuning the elastic net model with the custom tuning grid

set.seed(1234)
enet_tune_mod6 <- train(y ~ (R + G + B + Hue)^2 + Lightness + Saturation,
                   data = trained_data,
                   method = 'glmnet',
                   metric = my_metric,
                   tuneGrid = enet_grid_mod6,
                   preProcess = c("center", "scale"),
                   trControl = my_ctrl)

enet_tune_mod10 <- train(y ~ (ns(R, df = 3) + ns(G, df = 3) + ns(B, df = 3)) * Lightness,
                   data = trained_data,
                   method = 'glmnet',
                   metric = my_metric,
                   tuneGrid = enet_grid_mod10,
                   preProcess = c("center", "scale"),
                   trControl = my_ctrl)

plot(enet_tune_mod6, xTrans = log)

plot(enet_tune_mod10, xTrans = log)

For the above 2 plots I can see that,

Looking closely at the resampled averaged Accuracy values displayed in the above figure. Multiple tuning parameter combinations perform roughly the same as the overall best. The overall best averaged Accuracy is only slightly better than the next best result. By default caret chooses the overall best, but perhaps “pure” Lasso are within the 1 standard error of the overall best tuning parameter values. Extracting the one-standard error rule results is tricky to do in caret and hence I’m going with the recommended value by caret.

Finding out the Best Tune

enet_tune_mod6$bestTune
##    alpha      lambda
## 26   0.2 0.002324946
enet_tune_mod10$bestTune
##   alpha      lambda
## 1   0.1 0.001924909

Printing the coefficients for the tuned elastic net models for mod6

coef(enet_tune_mod6$finalModel, s = enet_tune_mod6$bestTune$lambda)
## 23 x 1 sparse Matrix of class "dgCMatrix"
##                              s1
## (Intercept)        -0.109359649
## R                   0.184777485
## G                   0.565321648
## B                   .          
## Hue                -0.016944750
## Lightnessdeep       0.020788290
## Lightnesslight     -0.012144467
## Lightnessmidtone   -0.024041176
## Lightnesspale       0.031750403
## Lightnesssaturated  0.003570312
## Lightnesssoft      -0.027561446
## Saturationgray     -0.020020043
## Saturationmuted    -0.008547795
## Saturationneutral  -0.024032891
## Saturationpure      0.015234729
## Saturationshaded   -0.019120060
## Saturationsubdued  -0.018949183
## R:G                 0.209375662
## R:B                 .          
## R:Hue              -0.069517957
## G:B                 0.276884905
## G:Hue               0.089941061
## B:Hue               .

From the above coefficients for the tuned elestic net model for mod6 we can see that

The coefficients from the tuned elastic net models reveal the impact of each predictor variable on the response. For instance, ‘R’ has a positive coefficient, suggesting that higher values of ‘R’ correlate with higher response values. Similarly, ‘G’ and ‘B’ also show positive associations with the response. However, some coefficients, like ‘B’ in the main effects and certain interaction terms, are close to zero, indicating weaker relationships. Interaction terms like ‘R:G’ and ‘G:B’ have positive coefficients, implying a stronger combined effect.

Printing the coefficients for the tuned elastic net models for mod10

coef(enet_tune_mod10$finalModel, s = enet_tune_mod10$bestTune$lambda)
## 70 x 1 sparse Matrix of class "dgCMatrix"
##                                              s1
## (Intercept)                       -1.093596e-01
## ns(R, df = 3)1                     2.502150e-01
## ns(R, df = 3)2                     1.076849e-01
## ns(R, df = 3)3                     2.672034e-01
## ns(G, df = 3)1                     4.443611e-01
## ns(G, df = 3)2                     3.185292e-01
## ns(G, df = 3)3                     5.117305e-01
## ns(B, df = 3)1                     6.016966e-02
## ns(B, df = 3)2                     5.361191e-02
## ns(B, df = 3)3                     1.271397e-01
## Lightnessdeep                      4.080432e-02
## Lightnesslight                     .           
## Lightnessmidtone                   .           
## Lightnesspale                      .           
## Lightnesssaturated                 6.975433e-02
## Lightnesssoft                      .           
## ns(R, df = 3)1:Lightnessdeep      -1.725636e-02
## ns(R, df = 3)2:Lightnessdeep      -4.706049e-03
## ns(R, df = 3)3:Lightnessdeep      -1.931700e-03
## ns(R, df = 3)1:Lightnesslight     -5.825089e-03
## ns(R, df = 3)2:Lightnesslight      2.773144e-02
## ns(R, df = 3)3:Lightnesslight      1.229827e-02
## ns(R, df = 3)1:Lightnessmidtone   -4.129204e-02
## ns(R, df = 3)2:Lightnessmidtone    5.074726e-02
## ns(R, df = 3)3:Lightnessmidtone   -6.513882e-03
## ns(R, df = 3)1:Lightnesspale      -4.097253e-03
## ns(R, df = 3)2:Lightnesspale       3.816349e-02
## ns(R, df = 3)3:Lightnesspale       2.413964e-02
## ns(R, df = 3)1:Lightnesssaturated -4.395484e-02
## ns(R, df = 3)2:Lightnesssaturated -8.485899e-04
## ns(R, df = 3)3:Lightnesssaturated -9.706842e-03
## ns(R, df = 3)1:Lightnesssoft      -2.531327e-02
## ns(R, df = 3)2:Lightnesssoft       5.624812e-02
## ns(R, df = 3)3:Lightnesssoft      -3.230787e-03
## ns(G, df = 3)1:Lightnessdeep       3.425273e-03
## ns(G, df = 3)2:Lightnessdeep       .           
## ns(G, df = 3)3:Lightnessdeep       3.865260e-02
## ns(G, df = 3)1:Lightnesslight     -3.776333e-02
## ns(G, df = 3)2:Lightnesslight      3.718591e-02
## ns(G, df = 3)3:Lightnesslight      5.257972e-02
## ns(G, df = 3)1:Lightnessmidtone   -9.607169e-03
## ns(G, df = 3)2:Lightnessmidtone    2.350067e-02
## ns(G, df = 3)3:Lightnessmidtone    5.870520e-02
## ns(G, df = 3)1:Lightnesspale      -3.882316e-02
## ns(G, df = 3)2:Lightnesspale       4.344124e-02
## ns(G, df = 3)3:Lightnesspale       6.070759e-02
## ns(G, df = 3)1:Lightnesssaturated -5.716917e-03
## ns(G, df = 3)2:Lightnesssaturated  1.041155e-03
## ns(G, df = 3)3:Lightnesssaturated  6.346867e-02
## ns(G, df = 3)1:Lightnesssoft      -3.206448e-02
## ns(G, df = 3)2:Lightnesssoft       1.455584e-02
## ns(G, df = 3)3:Lightnesssoft       5.123277e-02
## ns(B, df = 3)1:Lightnessdeep       9.560984e-03
## ns(B, df = 3)2:Lightnessdeep       9.430035e-03
## ns(B, df = 3)3:Lightnessdeep      -4.056102e-03
## ns(B, df = 3)1:Lightnesslight     -7.635820e-05
## ns(B, df = 3)2:Lightnesslight      2.517171e-03
## ns(B, df = 3)3:Lightnesslight      1.895545e-03
## ns(B, df = 3)1:Lightnessmidtone   -3.851477e-04
## ns(B, df = 3)2:Lightnessmidtone    .           
## ns(B, df = 3)3:Lightnessmidtone    1.648164e-04
## ns(B, df = 3)1:Lightnesspale       .           
## ns(B, df = 3)2:Lightnesspale       2.546985e-02
## ns(B, df = 3)3:Lightnesspale       5.151462e-06
## ns(B, df = 3)1:Lightnesssaturated  3.750926e-03
## ns(B, df = 3)2:Lightnesssaturated  2.118550e-02
## ns(B, df = 3)3:Lightnesssaturated  1.430884e-02
## ns(B, df = 3)1:Lightnesssoft      -4.269368e-06
## ns(B, df = 3)2:Lightnesssoft       .           
## ns(B, df = 3)3:Lightnesssoft      -7.445943e-04

From the above plot for mod10 we can say that,

The coefficients from the tuned elastic net model for Model 10 provide valuable insights into the relationship between predictor variables and the response.

Notably, the ‘ns(R, df = 3)’ and ‘ns(G, df = 3)’ variables exhibit varying coefficients across their respective basis functions, indicating complex relationships. For instance, ‘ns(R, df = 3)1’ and ‘ns(R, df = 3)3’ have positive coefficients, suggesting a nonlinear positive association with the response.

We can also see that, ‘ns(R, df = 3)2’ has a smaller positive coefficient, indicating a weaker effect. Similarly, ‘ns(G, df = 3)’ variables also display varying coefficients, with ‘ns(G, df = 3)3’ having the highest positive coefficient, indicating a stronger positive relationship with the response.

Moreover, interaction terms like ‘ns(R, df = 3)1:Lightnessdeep’ and ‘ns(G, df = 3)1:Lightnessdeep’ exhibit both positive and negative coefficients, suggesting nuanced relationships with different levels of the ‘Lightness’ variable.

Using Neural Networks

According to the mentioned instructions, I’m training a neural network to predict the response based on input data. Neural networks are powerful tools that can learn complex patterns from data, making them ideal for modeling and forecasting tasks.

We are training a neural network using the nnet package.

library(dplyr)

# Assuming the outcome column in train_dataset is named "outcome_column"
df_caret <- trained_data %>%
  select(R, G, B, Hue, Lightness, Saturation, y)

glimpse(df_caret)
## Rows: 835
## Columns: 7
## $ R          <dbl> 172, 26, 172, 28, 170, 175, 90, 194, 171, 122, 0, 88, 144, …
## $ G          <dbl> 58, 88, 94, 87, 66, 89, 78, 106, 68, 151, 121, 140, 82, 163…
## $ B          <dbl> 62, 151, 58, 152, 58, 65, 136, 53, 107, 59, 88, 58, 132, 50…
## $ Hue        <dbl> 4, 31, 8, 32, 5, 6, 34, 10, 1, 21, 24, 22, 36, 16, 26, 12, …
## $ Lightness  <chr> "dark", "dark", "dark", "dark", "dark", "dark", "dark", "da…
## $ Saturation <chr> "bright", "bright", "bright", "bright", "bright", "bright",…
## $ y          <dbl> -1.9924302, -2.1972246, -1.6582281, -2.1972246, -2.0907411,…

The above shows the glimpse of df_caret.

my_ctrl <- trainControl(method = 'repeatedcv', number = 10, repeats = 3)

my_metric <- "RMSE"

The metrics we chose for the Regression models is ’RMSE’and we are usng repeatedcv as our resampling method.

Train, assess, and tune the nnet neural network with the defined resampling scheme. Assign the result to the nnet_default object and print the result to the screen.

Analyzing the accuracy of the default neural model.

set.seed(1234)

nnet_default <- train( y ~ R + G + B + Hue + Saturation + Lightness,
                       data = df_caret,
                       method = 'nnet',
                       metric = 'RMSE',
                       preProcess = c("center", "scale"),
                       trControl = my_ctrl,
                       trace = FALSE)
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,
## : There were missing values in resampled performance measures.
nnet_default
## Neural Network 
## 
## 835 samples
##   6 predictor
## 
## Pre-processing: centered (16), scaled (16) 
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 753, 752, 751, 750, 750, 752, ... 
## Resampling results across tuning parameters:
## 
##   size  decay  RMSE       Rsquared   MAE      
##   1     0e+00  1.1880204        NaN  1.0114462
##   1     1e-04  1.1097171  0.7308234  0.8884935
##   1     1e-01  0.9701750  0.7450619  0.6615994
##   3     0e+00  1.1880204        NaN  1.0114462
##   3     1e-04  1.0590670  0.6262949  0.8100747
##   3     1e-01  0.9684767  0.7512178  0.6577233
##   5     0e+00  1.1880204        NaN  1.0114462
##   5     1e-04  1.0372072  0.6680582  0.7702765
##   5     1e-01  0.9680630  0.7526700  0.6569818
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 5 and decay = 0.1.

The neural network model trained using the nnet method and cross-validated with 10 folds (repeated 3 times) exhibited varying performance metrics across different tuning parameters. While the model achieved the lowest root mean squared error (RMSE) of approximately 0.968 with a network size of 5 and a decay rate of 0.1, it also showed improvements in other metrics such as R-squared and mean absolute error (MAE). Despite some missing values in resampled performance measures, the model’s ability to generalize to unseen data suggests its potential for accurately predicting the outcome variable based on the input features.

Resampling results

# Access the resampling results
resampling_results <- nnet_default$resample

# Extract Rsquared values
rmse_values <- resampling_results$RMSE

# Calculate mean Rsquared value as a proxy for accuracy
rmse_nnd <- mean(rmse_values, na.rm = TRUE)

# Print the accuracy
print(rmse_nnd)
## [1] 0.968063

The accuracy is 96.8% approximately.

As shown above, the best neural network has size = 5 and decay = 0.1. The best neural network therefore consists of 5 hidden units. This is not a large neural network. We would want to consider trying more hidden units to see if the results can be improved further.

Ploting the graph for nnet_default

plot(nnet_default, xTrans=log)

The plot depicts the relationship between the number of hidden units in the neural network model and the root mean squared error (RMSE). As the number of hidden units increases, there is a trend of decreasing RMSE, indicating improved model performance. Three distinct lines represent different weight decay values: 0, 1e-40, and 0.1. The line corresponding to weight decay of 0 shows a consistent RMSE value of 0 across varying numbers of hidden units, suggesting overfitting. In contrast, the line for weight decay of 1e-40 exhibits a slanting downward trend, indicating a smoother decrease in RMSE with increasing hidden units and potentially better generalization. Lastly, the line for weight decay of 0.1 hovers slightly above the axis, indicating a slight increase in RMSE compared to the decay value of 1e-40, suggesting some regularization effect.

Predictions to understand the behavior of the neural network

Predictions are made consistent with the previously trained elastic net model because caret managed the training of the neural network.

pred_viz_nnet_probs <- predict( nnet_default, newdata = viz_grid)

viz_nnet_df <- viz_grid %>% bind_cols(pred_viz_nnet_probs)
## New names:
## • `` -> `...7`
viz_nnet_df %>% glimpse()
## Rows: 784
## Columns: 7
## $ R          <int> 132, 15, 243, 177, 169, 240, 119, 230, 86, 249, 22, 19, 159…
## $ G          <int> 89, 255, 127, 232, 228, 171, 220, 19, 141, 126, 167, 51, 86…
## $ B          <int> 115, 176, 174, 211, 40, 136, 94, 92, 114, 28, 90, 29, 87, 2…
## $ Lightness  <fct> dark, dark, dark, dark, dark, dark, dark, dark, deep, deep,…
## $ Saturation <fct> bright, bright, bright, bright, bright, bright, bright, bri…
## $ Hue        <int> 10, 7, 13, 1, 32, 30, 8, 8, 15, 29, 10, 19, 0, 10, 27, 5, 1…
## $ ...7       <dbl> 0.0002466206, 0.4275381860, 0.0022749212, 0.9416263744, 0.3…

The above is the plot of viz_nnet_df

Tuning the model for better performance

The more refined neural network tuning grid is used below.

The code chunk below defines a custom tuning grid focused on tuning the decay parameter. The decay parameter is just the regularization strength and thus is similar to lambda used in elastic net with the ridge penalty!

nnet_grid <- expand.grid(size = c(5, 10, 20),
                         decay = exp(seq(-6, 2, length.out=11)))

nnet_grid %>% dim()
## [1] 33  2

The more refined neural network tuning grid is used below.

set.seed(1234)

nnet_tune <- train( y ~ R + G + B + Hue + Saturation + Lightness,
                    data = df_caret,
                    method = 'nnet',
                    metric = my_metric,
                    tuneGrid = nnet_grid,
                    preProcess = c("center", "scale"),
                    trControl = my_ctrl,
                    trace = FALSE)

nnet_tune$bestTune
##    size      decay
## 25   20 0.01227734

The neural network model was tuned using a more refined grid, resulting in the selection of optimal tuning parameters. The best performing model was achieved with 20 hidden units and a weight decay of 0.01227734. These parameters were chosen based on the specified evaluation metric and cross-validation settings, indicating their effectiveness in minimizing the error and optimizing model performance.

Tuned neural network accuracy:

# Access the resampling results
resampling_results <- nnet_tune$resample

# Extract RMSE values
rmse_values <- resampling_results$RMSE

# Calculate mean RMSE value as a proxy for accuracy
rmse_nnt <- mean(rmse_values, na.rm = TRUE)

# Print the accuracy
print(rmse_nnt)
## [1] 0.9664089

Plotting the tuned model

plot(nnet_tune, xTrans=log)

The plot of the tuned neural network model reveals a concave shape resembling a half parabola, indicating a non-linear relationship between the number of hidden units and the performance metric. Initially, as the number of hidden units increases logarithmically, the RMSE shows a decreasing trend, suggesting improved model performance with more complexity. However, after reaching a certain point, the RMSE starts to increase again, indicating diminishing returns or overfitting as the model becomes overly complex. This pattern highlights the importance of finding the optimal balance between model complexity and predictive accuracy to avoid overfitting and ensure robust generalization performance.

Predictions to understand the behavior of the tuned neural network

Predictions are made consistent with the previously trained elastic net model because caret managed the training of the neural network.

pred_viz_nnet_tune_probs <- predict( nnet_tune, newdata = viz_grid)

viz_nnet_tune_df <- viz_grid %>% bind_cols(pred_viz_nnet_tune_probs)
## New names:
## • `` -> `...7`
viz_nnet_tune_df %>% glimpse()
## Rows: 784
## Columns: 7
## $ R          <int> 132, 15, 243, 177, 169, 240, 119, 230, 86, 249, 22, 19, 159…
## $ G          <int> 89, 255, 127, 232, 228, 171, 220, 19, 141, 126, 167, 51, 86…
## $ B          <int> 115, 176, 174, 211, 40, 136, 94, 92, 114, 28, 90, 29, 87, 2…
## $ Lightness  <fct> dark, dark, dark, dark, dark, dark, dark, dark, deep, deep,…
## $ Saturation <fct> bright, bright, bright, bright, bright, bright, bright, bri…
## $ Hue        <int> 10, 7, 13, 1, 32, 30, 8, 8, 15, 29, 10, 19, 0, 10, 27, 5, 1…
## $ ...7       <dbl> 7.038663e-07, 3.480905e-01, 1.125417e-04, 9.786874e-01, 2.3…

The pred_viz_nnet_tune_probs dataframe is column binded to the viz_grid dataframe. The new object, viz_nnet_tune_df, provides the class predicted probabilities for each input combination in the visualization grid. The glimpse is given above

Visualization of Probability of outcome for Neural Network

The below is the graph showing the predicted probability of response based on R and G wrt to saturation before tuning

viz_nnet_df %>% 
  ggplot(mapping = aes(x = G, y = viz_nnet_df[, 7], color = R)) +
  geom_point(size = 3) +  # Adjust the size as needed
  facet_wrap(~Saturation, labeller = 'label_both') +
  scale_color_gradient(low = "#FF4040", high = "#8B2323") +
  theme_bw()

Based on the graph, it’s evident that preferences vary concerning the intensity of the color green (G) and saturation levels. For G values below 150 and across all saturation levels, there’s a noticeable decrease in preference. Conversely, for G values exceeding 200 and nearly all saturation levels, a consistent pattern emerges, suggesting a general dislike for lower shades of gray paired with red and saturation.

This visualization effectively illustrates the nuances in color preferences based on different levels of gray intensity and saturation, shedding light on potential trends and patterns in user preferences to make informed decisions.

The below is the graph showing the predicted probability of response based on G wrt to saturation

viz_nnet_df %>% 
  ggplot(mapping = aes(x = G, y = viz_nnet_df[, 7], color = G)) +
  geom_point(size = 3) +  # Adjust the size as needed
  facet_wrap(~Saturation, labeller = 'label_both') +
  scale_color_gradient(low = "#7FFF00", high = "#556B2F") +
  theme_bw()

The above graph reveals intriguing insights into color preferences based on varying levels of gray intensity (G) and saturation. Key observations include:

Low Gray Intensity, All Saturation Levels: For G values below 150, across all saturation levels, there’s a notable decrease in preference. This suggests a tendency towards avoiding colors with lower gray intensity, regardless of saturation. High Gray Intensity, All Saturation Levels: Conversely, as G values exceed 200, and across nearly all saturation levels, a consistent pattern emerges. Users tend to exhibit a general aversion towards colors with higher gray intensity, particularly when paired with red and saturation elements.

Lets now see how the plot looks like with the tuned values we have done .

Visualizing the Tuned Predicted Probability of outcome for Neural Network

The below is the graph showing the predicted probability of response based on R and G wrt to saturation after tuning

viz_nnet_tune_df %>% 
  ggplot(mapping = aes(x = G, y = viz_nnet_tune_df[, 7], color = R)) +
  geom_point(size = 3) +  # Adjust the size as needed
  facet_wrap(~Saturation, labeller = 'label_both') +
  scale_color_gradient(low = "#FF4040", high = "#8B2323") +
  theme_bw()

We can see the same trend follows for tuned values as well . From the above plots we can say that ,its the same pattern when comapred to the default plots .

The below is the graph showing the predicted probability of response based on B and Hue for various shades of Lightness after tuning

viz_nnet_tune_df %>% 
  
  ggplot(mapping = aes(x = Hue, y = viz_nnet_tune_df[, 7], color = B)) +
  geom_point(size = 3) +  # Adjust the size as needed
  facet_wrap(~Lightness, labeller = 'label_both') +
  scale_color_gradient(low = "#00FFFF", high = "#104E8B") +
  theme_bw()

With taken with Hue we can see a variation of changes.For instance With Hue more people tend to choose shades of blue greater than 100. And also that most of the people do not prefer hueat all the points are to the axis.

Using Random forest

Its a tree based method. Tree based models do not have the same kind of preprocessing requirements as other models. Thus, you do not need the preProcess argument in the caret::train() function call.

Below I’m training the random forest model. The formula interface uses the . operator to streamline selecting all inputs in the df_caret dataframe.

set.seed(1234)

rf_default <- train( y ~ .,
                     data = df_caret,
                     method = 'rf',
                     metric = my_metric,
                     trControl = my_ctrl,
                     importance = TRUE)

rf_default
## Random Forest 
## 
## 835 samples
##   6 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 753, 752, 751, 750, 750, 752, ... 
## Resampling results across tuning parameters:
## 
##   mtry  RMSE        Rsquared   MAE       
##    2    0.22739097  0.9762064  0.16676198
##    9    0.06922738  0.9967591  0.05109468
##   16    0.08493549  0.9949463  0.06176705
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 9.

RMSE for rf:

# Access the resampling results
resampling_results <- rf_default$resample

# Extract RMSE values
rmse_values <- resampling_results$RMSE

# Calculate mean RMSE value as a proxy for accuracy
rmse_rfd <- mean(rmse_values, na.rm = TRUE)

# Print the accuracy
print(rmse_rfd)
## [1] 0.06922738

Random forest behavior through predictions.

Let’s make predictions on the visualization grid, viz_grid, using the random forest model rf_default. Instruct thepredict()function to return the probabilities by setting type = ‘prob’.

pred_viz_rf_probs <- predict( rf_default, newdata = viz_grid)

The pred_viz_rf_probs dataframe is column binded to the viz_grid dataframe. The new object, viz_rf_df, provides the class predicted probabilities for each input combination in the visualization grid according to the random forest model.

viz_rf_df <- viz_grid %>% bind_cols(pred_viz_rf_probs)
## New names:
## • `` -> `...7`
viz_rf_df %>% glimpse()
## Rows: 784
## Columns: 7
## $ R          <int> 132, 15, 243, 177, 169, 240, 119, 230, 86, 249, 22, 19, 159…
## $ G          <int> 89, 255, 127, 232, 228, 171, 220, 19, 141, 126, 167, 51, 86…
## $ B          <int> 115, 176, 174, 211, 40, 136, 94, 92, 114, 28, 90, 29, 87, 2…
## $ Lightness  <fct> dark, dark, dark, dark, dark, dark, dark, dark, deep, deep,…
## $ Saturation <fct> bright, bright, bright, bright, bright, bright, bright, bri…
## $ Hue        <int> 10, 7, 13, 1, 32, 30, 8, 8, 15, 29, 10, 19, 0, 10, 27, 5, 1…
## $ ...7       <dbl> -1.8564834, 0.8136553, -0.6959419, 0.8602689, 0.4913258, -0…

The glimpse reveals that the popular column stores the predicted event probability.

Plotting the rf_default graph

plot(rf_default, xTrans=log)

Based on the plot of the Random Forest model, it’s evident that the accuracy of the model varies with different predictor variables (mtry). As the number of predictor variables considered at each split increases (mtry), the RMSE of the model also tends to decrease. Specifically, the accuracy decreases consistently as we increase the value of mtry, reaching its lowest point when mtry is set to 2.25. This suggests that including less predictor variables in the model leads to better predictive performance. Therefore, selecting a lower value for mtry, such as 16, is beneficial for achieving higher RMSE in the Random Forest model.

Tuning the rf model

Lets try tuning the model to see if we could increase the RMSE and Kappa by expanding the grid values.

# Define the tuning grid for Random Forest
rf_grid <- expand.grid(mtry = c(2,5,10,16))

set.seed(1234)

rf_tune <- train( y ~ R + G + B + Hue + Saturation + Lightness,
                    data = df_caret,
                    method = 'rf',
                    metric = my_metric,
                    tuneGrid = rf_grid,
                    preProcess = c("center", "scale"),
                    trControl = my_ctrl,
                    trace = FALSE)

rf_tune$bestTune
##   mtry
## 3   10

#RMSE for tune

# Access the resampling results
resampling_results <- rf_tune$resample

# Extract RMSE values
rmse_values <- resampling_results$RMSE

# Calculate mean RMSE value as a proxy for accuracy
rmse_rft <- mean(rmse_values, na.rm = TRUE)

# Print the accuracy
print(rmse_rft)
## [1] 0.07048404

Let’s now examine the tuned random forest behavior through predictions.

Predictions are made on the visualization grid, viz_grid, using the random forest model rf_default``.

pred_viz_rf_tune_probs <- predict( rf_tune, newdata = viz_grid )

The pred_viz_rf_tune_probs dataframe is column binded to the viz_grid dataframe. The new object, viz_rf_tune_df, provides the class predicted probabilities for each input combination in the visualization grid according to the random forest model.

viz_rf_tune_df <- viz_grid %>% bind_cols(pred_viz_rf_tune_probs)
## New names:
## • `` -> `...7`
viz_rf_tune_df %>% glimpse()
## Rows: 784
## Columns: 7
## $ R          <int> 132, 15, 243, 177, 169, 240, 119, 230, 86, 249, 22, 19, 159…
## $ G          <int> 89, 255, 127, 232, 228, 171, 220, 19, 141, 126, 167, 51, 86…
## $ B          <int> 115, 176, 174, 211, 40, 136, 94, 92, 114, 28, 90, 29, 87, 2…
## $ Lightness  <fct> dark, dark, dark, dark, dark, dark, dark, dark, deep, deep,…
## $ Saturation <fct> bright, bright, bright, bright, bright, bright, bright, bri…
## $ Hue        <int> 10, 7, 13, 1, 32, 30, 8, 8, 15, 29, 10, 19, 0, 10, 27, 5, 1…
## $ ...7       <dbl> -1.8785710, 0.8597316, -0.7193182, 0.8913525, 0.5103439, -0…

Plotting the tuned model

plot(rf_tune, xTrans=log)

From the above plot we can see how the accuracy of the Random Forest model varies as the number of predictors sampled at each split changes. Initially, with a smaller number of predictors (lower mtry values), the RMSE tends to be high. However, as the number of predictors increases, the RMSE generally decreases. There is not a much difference in the plot when compared to the rf_default plot .The trend remains the same.

The glimpse reveals that the event column stores the predicted event probability.

Visualizing the Predicted Probability of outcome for Random Forest Plot

The below is the graph showing the predicted probability of outcome based on B and Hue for various shades of Lightness

viz_rf_df %>% 
  ggplot(mapping = aes(x = Hue, y = viz_rf_df[, 7], color=B)) +
  geom_point(size=3)+
  facet_wrap(~Lightness, labeller = 'label_both') +
  scale_color_gradient(low = "#00FFFF", high = "#104E8B") +
  theme_bw()

#scale_color_gradient(low = "#7FFF00", high = "#556B2F") +

THe above graph is all spread out and we can say that they are between 0 and 1 .So the probabilities are not soo accurate.

Visualization of the Tuned Predicted Probability of outcome for Random Forest Plot

The below is the graph showing the predicted probability of outcome based on B and Hue for various shades of Lightness after tuning

viz_rf_tune_df %>% 
  ggplot(mapping = aes(x = Hue, y = viz_rf_tune_df[, 7], color = B)) +
  geom_point(size = 3) +  # Adjust the size as needed
  facet_wrap(~Lightness, labeller = 'label_both') +
  scale_color_gradient(low = "#00FFFF", high = "#104E8B") +
  theme_bw()

The plotted graph for the tuned random forest model (rf_tune) closely resembles that of the default random forest model (rf_default), indicating a negligible difference in RMSE between the two models. This similarity is expected since the difference in accuracy between rf_tune and rf_default is minimal. Consequently, the outcomes and probabilities derived from both models are expected to be practically identical.

It shows the relationship between the Hue values and the outcome, color-coded by the B values. Across different levels of Lightness, there seems to be varying patterns in the outcome as Hue changes. The color gradient indicates that higher B values are associated with certain patterns in the outcome across different Hue values.

Gradient Boosted Tree Model

The Gradient Boosted Tree model is trained and tuned below.

# Set the seed for reproducibility
set.seed(1234)
# Train the Gradient Boosted Trees model
gbm_default <- train(y ~ .,
                     data = df_caret,
                     method = 'gbm',
                     metric = "RMSE",
                     trControl = my_ctrl,
                     verbose = FALSE)

# Print the default GBM model
gbm_default$bestTune
##   n.trees interaction.depth shrinkage n.minobsinnode
## 9     150                 3       0.1             10

RMSE:

# Access the resampling results
resampling_results <- gbm_default$resample

# Extract RMSE values
rmse_values <- resampling_results$RMSE

# Calculate mean RMSE value as a proxy for accuracy
rmse_gbmd <- mean(rmse_values, na.rm = TRUE)

# Print the accuracy
print(rmse_gbmd)
## [1] 0.06040106

Examine the Gradient Boosted Trees behavior through predictions.

pred_viz_gbm_probs <- predict( gbm_default, newdata = viz_grid )

viz_gbm_df <- viz_grid %>% bind_cols(pred_viz_gbm_probs)
## New names:
## • `` -> `...7`
viz_gbm_df %>% glimpse()
## Rows: 784
## Columns: 7
## $ R          <int> 132, 15, 243, 177, 169, 240, 119, 230, 86, 249, 22, 19, 159…
## $ G          <int> 89, 255, 127, 232, 228, 171, 220, 19, 141, 126, 167, 51, 86…
## $ B          <int> 115, 176, 174, 211, 40, 136, 94, 92, 114, 28, 90, 29, 87, 2…
## $ Lightness  <fct> dark, dark, dark, dark, dark, dark, dark, dark, deep, deep,…
## $ Saturation <fct> bright, bright, bright, bright, bright, bright, bright, bri…
## $ Hue        <int> 10, 7, 13, 1, 32, 30, 8, 8, 15, 29, 10, 19, 0, 10, 27, 5, 1…
## $ ...7       <dbl> -1.88644414, 0.70949683, -0.62528294, 0.89328712, 0.4850030…

Tuning and Training the Gradient Boosted Trees model

# Train and tune the Gradient Boosted Trees model directly in the train function
num_predictors <- ncol(df_caret) - 1  # Assuming the last column is the outcome variable


gbm_tune <- train(y ~ .,
                  data = df_caret,
                  method = 'gbm',
                  metric = "RMSE",  # Use accuracy as the evaluation metric
                  trControl = my_ctrl,
                  tuneGrid = expand.grid(.n.trees = c(100, 200, 300),
                                         .interaction.depth = seq(1, num_predictors),
                                         .shrinkage = c(0.01, 0.1, 0.3),
                                         .n.minobsinnode = c(5, 10, 20)),
                  verbose = FALSE)
gbm_tune$bestTune
##     n.trees interaction.depth shrinkage n.minobsinnode
## 105     300                 6       0.1             10

RMSE:

# Access the resampling results
resampling_results <- gbm_tune$resample

# Extract RMSE values
rmse_values <- resampling_results$RMSE

# Calculate mean RMSE value as a proxy for accuracy
rmse_gbmt <- mean(rmse_values, na.rm = TRUE)

# Print the accuracy
print(rmse_gbmt)
## [1] 0.05284694

Plotting Gradient Boosted Tree default and tuned model

plot(gbm_default, xTrans=log)

plot(gbm_tune, xTrans=log)

Tuned Gradient Boosted Trees model through predictions.

pred_viz_gbmt_probs <- predict( gbm_tune, newdata = viz_grid )

The pred_viz_gbmt_probs dataframe is column binded to the viz_grid dataframe. The new object, viz_gbm_df, provides the class predicted probabilities for each input combination in the visualization grid according to the random forest model.

viz_gbmt_df <- viz_grid %>% bind_cols(pred_viz_gbmt_probs)
## New names:
## • `` -> `...7`
viz_gbmt_df %>% glimpse()
## Rows: 784
## Columns: 7
## $ R          <int> 132, 15, 243, 177, 169, 240, 119, 230, 86, 249, 22, 19, 159…
## $ G          <int> 89, 255, 127, 232, 228, 171, 220, 19, 141, 126, 167, 51, 86…
## $ B          <int> 115, 176, 174, 211, 40, 136, 94, 92, 114, 28, 90, 29, 87, 2…
## $ Lightness  <fct> dark, dark, dark, dark, dark, dark, dark, dark, deep, deep,…
## $ Saturation <fct> bright, bright, bright, bright, bright, bright, bright, bri…
## $ Hue        <int> 10, 7, 13, 1, 32, 30, 8, 8, 15, 29, 10, 19, 0, 10, 27, 5, 1…
## $ ...7       <dbl> -1.80735282, 0.74697259, -0.59269925, 0.89433385, 0.5391875…

The glimpse reveals that the event column stores the predicted event probability for tuned Gradient Boosting Tree model.

Visualization

For the default model:

viz_gbm_df %>% 
  ggplot(mapping = aes(x = Hue, y = viz_gbm_df[, 7], color=B)) +
  geom_point(size=3)+
  facet_wrap(~Lightness, labeller = 'label_both') +
  scale_color_gradient(low = "#00FFFF", high = "#104E8B") +
  theme_bw()

For Tuned Model:

viz_gbmt_df %>% 
  ggplot(mapping = aes(x = Hue, y = viz_gbmt_df[, 7], color = B)) +
  geom_point(size = 3) +  # Adjust the size as needed
  facet_wrap(~Lightness, labeller = 'label_both') +
  scale_color_gradient(low = "#00FFFF", high = "#104E8B") +
  theme_bw()

Generalized Additive Models (GAM) with caret:

# Set seed for reproducibility
set.seed(1234)

gam_model <- train(
  y ~ .,
  data = df_caret,
  method = "gam",
  metric = "RMSE",
  trControl = my_ctrl,
  tuneGrid = expand.grid(
    select = c(0.1, 0.2, 0.3),  # Specify the smoothing parameter values to tune
    method = "GCV.Cp"            # Specify the method for tuning the smoothing parameter
  )
)
## Loading required package: mgcv
## Loading required package: nlme
## 
## Attaching package: 'nlme'
## The following object is masked from 'package:dplyr':
## 
##     collapse
## This is mgcv 1.9-0. For overview type 'help("mgcv-package")'.
# Print the best hyperparameters
gam_model$bestTune
##   select method
## 1    0.1 GCV.Cp

The output you provided indicates that the best hyperparameters chosen by the tuning process are:

select: 0.1 method: GCV.Cp Here’s what each of these means:

select: This corresponds to the smoothing parameter value chosen for the GAM model. Smoothing is a technique used in GAMs to deal with non-linear relationships between predictors and the outcome. A smaller select value indicates less smoothing, allowing the model to capture more complex patterns in the data. In this case, the tuning process has found that a select value of 0.1 resulted in the best performance based on the chosen metric (in this case, accuracy). method: This specifies the method used for tuning the smoothing parameter. In GAMs, there are different methods available for selecting the optimal smoothing parameter value. Common methods include generalized cross-validation (GCV) and the Cp criterion. Here, the tuning process used the GCV.Cp method.

RMSE

# Access the resampling results
resampling_results <- gam_model$resample

# Extract RMSE values
rmse_values <- resampling_results$RMSE

# Calculate mean RMSE value as a proxy for accuracy
rmse_gamd <- mean(rmse_values, na.rm = TRUE)

# Print the accuracy
print(rmse_gamd)
## [1] 0.05680794

Prediction

pred_viz_gam_probs <- predict( gam_model, newdata = viz_grid )
viz_gam_df <- cbind(viz_grid, pred_viz_gam_probs)

viz_gam_df %>% glimpse()
## Rows: 784
## Columns: 7
## $ R                  <int> 132, 15, 243, 177, 169, 240, 119, 230, 86, 249, 22,…
## $ G                  <int> 89, 255, 127, 232, 228, 171, 220, 19, 141, 126, 167…
## $ B                  <int> 115, 176, 174, 211, 40, 136, 94, 92, 114, 28, 90, 2…
## $ Lightness          <fct> dark, dark, dark, dark, dark, dark, dark, dark, dee…
## $ Saturation         <fct> bright, bright, bright, bright, bright, bright, bri…
## $ Hue                <int> 10, 7, 13, 1, 32, 30, 8, 8, 15, 29, 10, 19, 0, 10, …
## $ pred_viz_gam_probs <dbl> -1.92328260, 1.08828385, -0.57187511, 1.01514636, 0…

The glimpse reveals that the event column stores the predicted event probability. Then visualize the predicted event probability in a manner consistent with the viz_bayes_logpost_preds() function and the tuned elastic net model predictions. Visualize the predicted probability as a line (curve) with respect to G, for each combination of B and R.

Tuning Generalized Additive Models (GAM):

tuning_results <- gam_model$results
print(tuning_results)
##   select method       RMSE  Rsquared        MAE      RMSESD   RsquaredSD
## 1    0.1 GCV.Cp 0.05680794 0.9977628 0.04028667 0.006941342 0.0005118255
## 2    0.2 GCV.Cp 0.05680794 0.9977628 0.04028667 0.006941342 0.0005118255
## 3    0.3 GCV.Cp 0.05680794 0.9977628 0.04028667 0.006941342 0.0005118255
##         MAESD
## 1 0.003681035
## 2 0.003681035
## 3 0.003681035

Tuning the more refined GAM grid is used below.

# Set seed for reproducibility
set.seed(1234)

gam_tune <- train(
  y ~ .,
  data = df_caret,
  method = "gam",
  metric = "RMSE",
  trControl = my_ctrl,
  tuneGrid = expand.grid(
    select = c(0.3, 3, 6),  # Specify the smoothing parameter values to tune
    method = "GCV.Cp"            # Specify the method for tuning the smoothing parameter
  )
)
# Print the best hyperparameters
gam_tune$results
##   select method       RMSE  Rsquared        MAE      RMSESD   RsquaredSD
## 1    0.3 GCV.Cp 0.05680794 0.9977628 0.04028667 0.006941342 0.0005118255
## 2    3.0 GCV.Cp 0.05680794 0.9977628 0.04028667 0.006941342 0.0005118255
## 3    6.0 GCV.Cp 0.05680794 0.9977628 0.04028667 0.006941342 0.0005118255
##         MAESD
## 1 0.003681035
## 2 0.003681035
## 3 0.003681035

RMSE

# Access the resampling results
resampling_results <- gam_tune$resample

# Extract RMSE values
rmse_values <- resampling_results$RMSE

# Calculate mean RMSE value as a proxy for accuracy
rmse_gamt <- mean(rmse_values, na.rm = TRUE)

# Print the accuracy
print(rmse_gamt)
## [1] 0.05680794

Prediction

pred_viz_gamt_probs <- predict( gam_tune, newdata = viz_grid )
viz_gamt_df <- cbind(viz_grid, pred_viz_gamt_probs)

viz_gamt_df %>% glimpse()
## Rows: 784
## Columns: 7
## $ R                   <int> 132, 15, 243, 177, 169, 240, 119, 230, 86, 249, 22…
## $ G                   <int> 89, 255, 127, 232, 228, 171, 220, 19, 141, 126, 16…
## $ B                   <int> 115, 176, 174, 211, 40, 136, 94, 92, 114, 28, 90, …
## $ Lightness           <fct> dark, dark, dark, dark, dark, dark, dark, dark, de…
## $ Saturation          <fct> bright, bright, bright, bright, bright, bright, br…
## $ Hue                 <int> 10, 7, 13, 1, 32, 30, 8, 8, 15, 29, 10, 19, 0, 10,…
## $ pred_viz_gamt_probs <dbl> -1.92328260, 1.08828385, -0.57187511, 1.01514636, …

Visualization

For default gam model

viz_gam_df %>% 
  ggplot(mapping = aes(x = Hue, y = viz_gam_df[, 7], color=B)) +
  geom_point(size=3)+
  facet_wrap(~Lightness, labeller = 'label_both') +
  scale_color_gradient(low = "#00FFFF", high = "#104E8B") +
  theme_bw()

For Tuned gam model

viz_gamt_df %>% 
  ggplot(mapping = aes(x = Hue, y = viz_gamt_df[, 7], color = B)) +
  geom_point(size = 3) +  # Adjust the size as needed
  facet_wrap(~Lightness, labeller = 'label_both') +
  scale_color_gradient(low = "#00FFFF", high = "#104E8B") +
  theme_bw()

#KNN

K-Nearest Neighbors (KNN) is a simple yet effective algorithm used for both classification and regression tasks. It works by identifying the K nearest data points to a given query point and predicting the class or value based on the most common class or average value among its neighbors, respectively.

We now train the KNN model using the train function from the caret package. We specify the method as “knn” and use accuracy as the evaluation metric.

set.seed(1234)

knn_default <- train(y ~ .,
                     data = df_caret,
                     method = "knn",
                     trControl = my_ctrl,
                     tuneLength = 10)  # Set the desired value of K

knn_default
## k-Nearest Neighbors 
## 
## 835 samples
##   6 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 753, 752, 751, 750, 750, 752, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE        Rsquared   MAE       
##    5  0.09085295  0.9941956  0.06814833
##    7  0.09552074  0.9935983  0.06996092
##    9  0.09962224  0.9930040  0.07244288
##   11  0.10437028  0.9923264  0.07591964
##   13  0.10716197  0.9919102  0.07815402
##   15  0.11029943  0.9914546  0.08047162
##   17  0.11387491  0.9908989  0.08293891
##   19  0.11785148  0.9902752  0.08580610
##   21  0.12214235  0.9895572  0.08923681
##   23  0.12608584  0.9889255  0.09212414
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 5.

From the above outcomes we can see that,

The k-Nearest Neighbors (KNN) algorithm was applied to classify samples into ‘popular_paint’ and ‘non_popular_paint’ classes using a dataset with 835 samples and 6 predictor variables. Through cross-validated performance evaluation, KNN was tuned over a range of K values from 5 to 23. The optimal model achieved an accuracy of approximately 80.48% with a K value of 9. This indicates that when considering the 9 nearest neighbors, the model correctly predicts the class label for about 80.48% of the samples. KNN’s simplicity and effectiveness make it a valuable tool for classification tasks, especially when dealing with relatively small datasets.

RMSE

# Access the resampling results
resampling_results <- knn_default$resample

# Extract RMSE values
rmse_values <- resampling_results$RMSE

# Calculate mean RMSE value as a proxy for accuracy
rmse_knnd <- mean(rmse_values, na.rm = TRUE)

# Print the accuracy
print(rmse_knnd)
## [1] 0.09085295

Predictions to understand the behavior of the model

# Predictions to understand the behavior of the Gradient Boosted Trees model
pred_viz_knn_probs <- predict(knn_default, newdata = viz_grid)

# Combine predictions with visualization grid
viz_knn_df <- cbind(viz_grid, pred_viz_knn_probs)

# Display a glimpse of the combined dataframe
viz_knn_df %>% glimpse()
## Rows: 784
## Columns: 7
## $ R                  <int> 132, 15, 243, 177, 169, 240, 119, 230, 86, 249, 22,…
## $ G                  <int> 89, 255, 127, 232, 228, 171, 220, 19, 141, 126, 167…
## $ B                  <int> 115, 176, 174, 211, 40, 136, 94, 92, 114, 28, 90, 2…
## $ Lightness          <fct> dark, dark, dark, dark, dark, dark, dark, dark, dee…
## $ Saturation         <fct> bright, bright, bright, bright, bright, bright, bri…
## $ Hue                <int> 10, 7, 13, 1, 32, 30, 8, 8, 15, 29, 10, 19, 0, 10, …
## $ pred_viz_knn_probs <dbl> -1.6606001, -0.6901992, -0.4655066, 0.8582391, 0.11…

Plot for nb_default

Lets try to analyze it by a graph

plot(knn_default,main="KNN Default Plot", xTrans=log)

The plot showcases the relationship between the number of neighbors (K) considered in the k-Nearest Neighbors (KNN) algorithm and the corresponding classification accuracy.

Tuning KNN

# Define a tuning grid for KNN
knn_grid <- expand.grid(k = seq(1, 20, by = 2))  # Define the range of K values

set.seed(1234)

knn_tune <- train(y ~ .,
                  data = df_caret,
                  method = "knn",
                  metric= my_metric,
                  trControl = my_ctrl,
                  tuneGrid = knn_grid)

knn_tune
## k-Nearest Neighbors 
## 
## 835 samples
##   6 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 753, 752, 751, 750, 750, 752, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE        Rsquared   MAE       
##    1  0.12302331  0.9894370  0.09078516
##    3  0.09705306  0.9933391  0.07262540
##    5  0.09085295  0.9941956  0.06814833
##    7  0.09552074  0.9935983  0.06996092
##    9  0.09962224  0.9930040  0.07244288
##   11  0.10437028  0.9923264  0.07591964
##   13  0.10716197  0.9919102  0.07815402
##   15  0.11029943  0.9914546  0.08047162
##   17  0.11387491  0.9908989  0.08293891
##   19  0.11785148  0.9902752  0.08580610
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 5.

RMSE

# Access the resampling results
resampling_results <- knn_tune$resample

# Extract RMSE values
rmse_values <- resampling_results$RMSE

# Calculate mean RMSE value as a proxy for accuracy
rmse_knnt <- mean(rmse_values, na.rm = TRUE)

# Print the accuracy
print(rmse_knnt)
## [1] 0.09085295

Predictions to understand the behavior of the tuned knn model

# Predictions to understand the behavior of the Gradient Boosted Trees model
pred_viz_knn_tune_probs <- predict(knn_tune, newdata = viz_grid)

# Combine predictions with visualization grid
viz_knn_tune_df <- cbind(viz_grid, pred_viz_knn_tune_probs)

# Display a glimpse of the combined dataframe
viz_knn_tune_df %>% glimpse()
## Rows: 784
## Columns: 7
## $ R                       <int> 132, 15, 243, 177, 169, 240, 119, 230, 86, 249…
## $ G                       <int> 89, 255, 127, 232, 228, 171, 220, 19, 141, 126…
## $ B                       <int> 115, 176, 174, 211, 40, 136, 94, 92, 114, 28, …
## $ Lightness               <fct> dark, dark, dark, dark, dark, dark, dark, dark…
## $ Saturation              <fct> bright, bright, bright, bright, bright, bright…
## $ Hue                     <int> 10, 7, 13, 1, 32, 30, 8, 8, 15, 29, 10, 19, 0,…
## $ pred_viz_knn_tune_probs <dbl> -1.6606001, -0.6901992, -0.4655066, 0.8582391,…

Plotting Tuned model

# Plot the performance of Naive Bayes model

plot(knn_default, main="KNN Tuned Plot", xTrans=log)

Predictions

# Predictions for Naive Bayes default model
pred_viz_knn_probs_default <- predict(knn_default, newdata = viz_grid)

# Combine predictions with the visualization grid
viz_knn_df_default <- bind_cols(viz_grid, as.data.frame(pred_viz_knn_probs_default))

# Glimpse the combined dataframe
glimpse(viz_knn_df_default)
## Rows: 784
## Columns: 7
## $ R                          <int> 132, 15, 243, 177, 169, 240, 119, 230, 86, …
## $ G                          <int> 89, 255, 127, 232, 228, 171, 220, 19, 141, …
## $ B                          <int> 115, 176, 174, 211, 40, 136, 94, 92, 114, 2…
## $ Lightness                  <fct> dark, dark, dark, dark, dark, dark, dark, d…
## $ Saturation                 <fct> bright, bright, bright, bright, bright, bri…
## $ Hue                        <int> 10, 7, 13, 1, 32, 30, 8, 8, 15, 29, 10, 19,…
## $ pred_viz_knn_probs_default <dbl> -1.6606001, -0.6901992, -0.4655066, 0.85823…
# Predictions for tuned Naive Bayes model
pred_viz_knn_probs_tune <- predict(knn_tune, newdata = viz_grid)

# Combine predictions with the visualization grid
viz_knn_df_tune <- bind_cols(viz_grid, as.data.frame(pred_viz_knn_probs_tune))

# Glimpse the combined dataframe
glimpse(viz_knn_df_tune)
## Rows: 784
## Columns: 7
## $ R                       <int> 132, 15, 243, 177, 169, 240, 119, 230, 86, 249…
## $ G                       <int> 89, 255, 127, 232, 228, 171, 220, 19, 141, 126…
## $ B                       <int> 115, 176, 174, 211, 40, 136, 94, 92, 114, 28, …
## $ Lightness               <fct> dark, dark, dark, dark, dark, dark, dark, dark…
## $ Saturation              <fct> bright, bright, bright, bright, bright, bright…
## $ Hue                     <int> 10, 7, 13, 1, 32, 30, 8, 8, 15, 29, 10, 19, 0,…
## $ pred_viz_knn_probs_tune <dbl> -1.6606001, -0.6901992, -0.4655066, 0.8582391,…

Visualization

For default gam model

viz_knn_df %>% 
  ggplot(mapping = aes(x = Hue, y = viz_knn_df[, 7], color=B)) +
  geom_point(size=3)+
  facet_wrap(~Lightness, labeller = 'label_both') +
  scale_color_gradient(low = "#00FFFF", high = "#104E8B") +
  theme_bw()

For Tuned gam model

viz_knn_tune_df %>% 
  ggplot(mapping = aes(x = Hue, y = viz_knn_tune_df[, 7], color = B)) +
  geom_point(size = 3) +  # Adjust the size as needed
  facet_wrap(~Lightness, labeller = 'label_both') +
  scale_color_gradient(low = "#00FFFF", high = "#104E8B") +
  theme_bw()

For most of the models above: 1. Accuracy as the Evaluation Metric: - Accuracy is a straightforward metric that measures the proportion of correctly classified instances out of the total instances. - It is easy to interpret and understand, making it a popular choice for classification tasks. - Especially in balanced datasets (where the classes are roughly equal in size), accuracy provides a good overall measure of model performance.

  1. k-Fold Cross-Validation:
    • k-Fold Cross-Validation is a robust resampling technique for estimating the performance of a predictive model.
    • It divides the dataset into k equal-sized folds and iteratively trains the model on k-1 folds while using the remaining fold for validation.
    • This process is repeated k times, with each fold used exactly once as the validation set.
    • It helps in reducing bias and variance in model evaluation by using multiple train-test splits of the dataset.
    • k-Fold Cross-Validation provides a more accurate estimate of model performance compared to a single train-test split, especially when the dataset is limited.

By combining accuracy as the evaluation metric with k-fold cross-validation as the resampling scheme, you ensure that: - The model is evaluated based on its ability to correctly classify instances. - The model’s performance is assessed robustly across multiple train-test splits of the data, reducing the risk of overfitting or underfitting.

Identifying the best model

library(ggplot2)

# Create a data frame with model names and RMSE values
model_rmse <- data.frame(Model = c("Neural Network", "Random Forest", "GBM", "GAM", "KNN"),
                          RMSE = c(rmse_nnt, rmse_rft, rmse_gbmd, rmse_gamd, rmse_knnd))

# Plot the bar graph
ggplot(model_rmse, aes(x = Model, y = RMSE, fill = Model)) +
  geom_bar(stat = "identity") +
  labs(title = "RMSE Values for Different Models",
       x = "Model",
       y = "RMSE") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

#### The best model according to RMSE is GAM.

You must decide the resampling scheme -That resampling scheme must be applied to ALL models!

The resampling schemes utilized for training, tuning, and interpreting the models are as follows:K - Cross Validation.

Cross-validation is a technique used to assess the performance of a predictive model by dividing the dataset into subsets, training the model on a portion of the data, and evaluating its performance on the remaining data. This process is repeated multiple times to ensure robustness and reliability of the model’s performance metrics.

As we need our models to be splitted nto subsets and know the effect on one part on the remaining data I chose to consider Crosee Validation as my resampling scheme.

Different models have different preprocessing requirements.

You must decide the appropriate preprocessing options you should consider

Data preprocessing was conducted to handle any missing or NaN values, ensuring the datasets were clean and ready for modeling. This step was carried out prior to training each model, and any necessary data cleaning procedures were implemented as part of this preprocessing stage.

The Pre-processing included:

  • Categorical input values were converted into integers for compatibility with ML models.
  • The response variable y underwent a logit transformation for preprocessing.
  • Handling of NaN values, negative values, and outliers was performed during preprocessing.

You must identify the performance metrics you will focus on to compare the models.

You must identify the best model.

The performance metrics used for training, tuning, and interpreting the models are as follows:

  1. Linear models - RMSE
  2. Regularized regression with Elastic net - RMSE
  3. Neural Network - RMSE
  4. Random Forest - RMSE
  5. Gradient Boosted Tree - RMSE
  6. Support Vector Machine (SVM) - RMSE
  7. KNN - Accuracy

From all the performance metrics for RMSE as calculated above I can see that GAM is the best and predicted the responses

Saving the models

viz_grid %>% readr::write_rds("viz_grid.rds")
enet_tune_mod10 %>% readr::write_rds("enet_tune_mod10.rds")
gam_tune %>% readr::write_rds("gam_tune.rds")